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FAPELANNTFVI DGPETEECPTANRAWNSMEVEDFGFGLTSTRMFLRI RETNTTECDS 
KIIGTAVKN^^yIAVHSDLSYWIESGLNDTWKLERAVLGEVKSCTWPETHTLWGDGVLE^ 
DLIIPITLAGPRSNHNRRPGYKTQNQGPWDEGRVEIDFDYCPGTTVTISDSCGHRGPA 
ARTTTESGKLITDWCCRSCTLPPLRFQTENGCWYGMEIRPTRHDEKTLVQSRVNAYNA 
DMI DP FQLGLLWFLATQEVLRKRWTAKI S I PAIMLALLVLVFGGI T YTDVLRYVI LV 
GAAFAEAN S GG DWH LALMAT FK I Q P VFLVAS FL KARWTNQ E S I LLMLAAAF FQMAY Y 
DAKNVLSWEVPDVLNSLSVAWMILRAISFTNTShTWVPLI^LTPGLKCLNLDVYRIL 
LLMVGVGS LI KEKRS SAAKKKGACLI CLALASTGVFNPMI LAAGLMACDPNRKRGWPA 
T EVMT AVGLMFAI VG G LAE L D I DSMAI PMT I AGLMFVAFVI S GK S T DMW I ERTAD I TW 
ESDAEITGSSERVDVRLDDDGNFQLMNDPGAPWKIWMLRMACLAISAYTPWAILPSVI 
GFWITLQYTKRGGVLWDTPSPKEYKKGDTTTGVYRIMTRGLLGSYQAGAGVMVEGVFH 
TLWHTTKGAALMSGEGRLDPYWGSVKEDRLCYGGPWKLQHKWNGHDEVQMIVVEPGKN 
VKNVQTKPGVFKTPEGEIGAVTLDYPTGTSGSPIVDKNGDVIGLYGNGVIMPNGSYIS 
AIVQGERMEEPAPAGFEPEMLRKKQITVLDLHPGAGKTRKILPQIIKEAINKRLRTAV 
IAPTRWAAEMSEALRGLPIRYQTSAVHREHSGNEIVDVMCHATLTHRLMSPHRVPNY 
NLFIMDEAHFTDPASIAARGYIATKVELGEAAAI FMTATPPGTSDPFPESNAPITDMQ 
TEIPDRAWNTGYEWITEYVGKTVWFVPSVKMGNEIALCLQRAGKKVIQLNRKSYETEY 
PKCKNDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIIEEGDGRVILGEPSAITAA 
SAAQRRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLVAQLYQ 
PEREKVYTMDGEYRLRGEERKNFLEFLRTADLPVWLAYKVAAAGISYHDRKWCFDGPR 
TNTILEDNNEVEVITKLGERKILRPRWADARVYSDHQALKSFKDFASGKRSQIGLVEV 
LGRMPEHFMGKTWEALDTMYWATAEKGGRAHM 

VFFLLMQRKGIGKIGLGGVILGAATFFCWMADVPGTKIAGMLLLSLLLMIVLIPEPEK 
QRSQTDNQLAVFLICVLTLVSAVAANEMGWLDKTKNDISSLLGHKPEARETTLGVESF 
LLDLRPATAWSLYAVTTAVLTPLLKHLITSDYINTSLTSINVQASALFTLARGFPFVD 
VGVSALLIJ^GCWGQWLTWWAAALLFCHYAYMVPGWQAEAMRSAQRRTAAGIMKN 
AWDGI VAT D VP E L E RTT P VMQ KKVGQ I ML I L VSMAAVWN P S VRT VREAG I LTTAAA 
VTLWENGAS SVWNATTAIGLCHIMRGGWLSCLSITWTLIKNMEKPGLKRGGAKGRTLG 
EVWKERLNHMTKEEFTRYRKEAITEVDRSAAKHARREGNITGGHPVSRGTAKLRWLVE 
RRFLEPVGKVVDLGCGRGGWCYYMATQKRVQEVKGYTKGGPGHEEPQLVQSYGWNIVT 
MKSGVDVFYRPSEASDTLLCDIGESSSSAEVEEHRTVRVLEMVEDWLHRGPKEFCIKV 
LCPYMPKVIEKMETLQRRYGGGLVRNPLSRNSTHEMYWV^ 

GRMEKKTWKGPQFEEDVNLGSGTRAVGKPLLNSDTSKIKNRIERLKKEYSSTWHQDAN 
HP YRTWN YHGS YEVKPTGSAS S LVNGWRLLS KPWDT I TNVTTMAMTDTT P FGQQRVF 
KEKVDTKAPEPPEGVKYVLNETTNWLWAFIARD^ 
EEQNQWKNAREAVEDPKFWEMVDEEREAHLRGECNTC^ 

RAIWFMWLGARFLEFEALGFLNEDHWLGRKNSGGGVEGLGLQKLGYILKEVGTKPGGK 

IYADDTAGWDTRITKADLENEAKVLELLDGEHRRLARSIIELTYRHKVVKVMRPAADG 

KTWDVISREDQRGSGQVVTYALNTFTNLAVQLVRMMEGEGVIGPDDVEKLGKGKGPK 

VRTWLFENGEERLSRMAVSGDDCVVKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGW 

YDWQQVPFCSNHFTELIMKDGRTLWPCRGQDELI GRARI S PGAGWNVRDTACLAKS Y 

AQMWLLLYFHRRDLRLMANAI CSAVPVNWVPTGRTTWS I HAKGEWMTTEDMLSVWNRV 

WIEENEWMEDKTPVERWSDVPYSGKREDIWCGSLIGTRTRATWAENIHVAINQVRSVI 

GEEKYVDYMSSLRRYEDTIWEDTVL" 
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mat_peptide 6916. .7683 
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mat_peptide 7684. .10398 

/product="NS5 protein" 
misc_feature 2398. .2469 

/note="NSl signal peptide" 
3 1 UTR 10402. .11048 

ORIGIN 

Query Match 87.4%; Score 1327.8; DB 10; Length 11048; 

Best Local Similarity 92.3%; Pred. No. 0; 

Matches 1398; Conservative 0; Mismatches 117; Indels 0; Gaps 6; 

Qy 1 CGGAATTCAGCTTCAACTGTTTAGGAATGAGCAACAGGGACTTCCTGGAGGGAGTGTCTG 60 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II 
Db 956 C AGCAT AC AGCT T CAACTGCTT AGGAAT GAGCAAC AGAGACTT C CT GGAGGGAGT GT CTG 1015 

Qy 61 GAGCT AC AT GGGT T GAT CT GGT ACT GGAAGGAGAC AGTT GT GT GAC CATAAT GT CAAAAG 120 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1016 GAGCCAC AT GGGT T GAT CT GGT ACT GGAAGGC GAC AGCT GT GTAAC CATAAT GT CAAAAG 1075 

Qy 121 ACAAGCCAACCATT GATGT CAAAATGATGAACAT GGAAGCAGCTAAT CT CGCAGATGTGC . 180 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 1076 ACAAGCCAACCATT GATGT CAAAATGATGAACAT GGAAGCAGCTAAT CTTGCAGATGTGC 1135 

Qy 181 GT AGCT ACT GCTACTT AGCT TC GGT CAGT GAT CTGTCAACAAAAGCCGCGTGTCCAACCA 240 

I II I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1136 GC AGTT ACT GT T AC CT AGCTT CAGT CAGT GACTT GT CAACAAGAGCC GC GT GT C CAAC CA 1195 

Qy 241 TGGGTGAAGCTCACAACGAGAAAAGAGCCGACCCTGCCTTTGTTTGCAAGCAAGGCGTCG 300 

I I I I I I I I I I I I I I I II I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I 
Db 1196 TGGGT GAAGCC CACAAT GAAAAAAGAGCT GAT CC CGC CT T C GT T TGCAAGCAAGGCGTTG 1255 

Qy 301 TAGACAGAGGAT GGGGGAAT GGATGCGGACTGTTTGGAAAGGGGAGCATT GACACAT GTG 360 

I I I I I I I I I I I II I I II I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I 
Db 1256 TAGATAGAGGAT GGGGAAACGGATGCGGACTGTTTGGAAAAGGGAGCATT GACACAT GTG 1315 

Qy 361 CAAAGTT T GC CT GT ACAAC CAAGGCAACT GGTT GGATT AT C C AGAAGGAAAAC AT C AAGT 420 

i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii r I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1316 CGAAGT TT GCCT GT ACAAC CAAAGCGACT GGTT GGAT CAT CC AGAAGGAAAAC AT CAAGT 1375 



Qy 

Db 



421 AC GAGGTT GCCAT ATTT GT GCAT GGCCC GAC GACT GT C GAAT CACAT GGCAATT ATT CAA 480 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
1376 AT GAGGTT GCCAT ATTT GT GCAT GGCCC GAC GACC GT T GAAT CT CAT GGT GATT ATT CAA 1435 



481 CACAGAT AGGGGCT AC C CAAGC AGGAAGGTT C AGC AT AACT CC AT CGGC AC CAT CCT ACA 540 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I Mill I II I 
1436 CACAGAT AGGGGCCAC C CAGGCT GGAAGATT C AGC AT AACT C C AT CGGC GC CAT CTT ACA 1495 

541 CGCT GAAGTT GGGT GAGT AT GGT GAGGT C AC AGTT GACTGT GAGC C ACGGT C AG GAAT AG 600 
I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1496 CGCTAAAGTT GGGT GAGT AT GGT GAGGT AAC GGTT GATT GT GAGCCACGGT CAGGAATAG 1555 

601 AC ACTAGCGCTT ACT AC GT TAT GTC AGT GGGT GC GAAGT CCTTCTTGGTT C AC C GAGAAT 660 
I II I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
1556 ACACTAGC GC CT ATT ACGT TAT GTC AGTT GGT GC GAAGT C CT T C CT GGT T C AC C GAGAAT 1615 

661 GGT TT AT GGAC CT GAACCT T CCAT GGAGT AGC GCT GGAAGCACAAC GT GGAGGAAC C GGG 720 
I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
1616 GGT T CAT GGAT CT GAACCT GC CAT GGAGC AGT GCT GGAAGCAC C AC GT GGAGGAAT C GGG 1675 

721 AAAC ACT GAT GGAGTT T GAAGAAC CT C AT GC C AC CAAACAAT CT GT C GT AGCT CT AGGGT 780 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
1676 AAAC ACT GAT GGAGT T T GAAGAACCT CAT GC C AC CAAACGAT CT GTT GT GGCT CT AGGGT 1735 

781 CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 840 

I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
1736 CGCAGGAAGGCGCTTTGCACCAAGCTCTGGCCGGAGCGATTCCTGTTGAATTCTCAAGCA 1795 

841 AC ACT GT GAAGT T GAC AT CAGGAC AT CT GAAGT GT AGGGT GAAGAT GGAGAAGTT GC AGC 900 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1796 AC ACT GT GAAGT TGAC AT CAGGAC AT CT GAAGT GT AGGGT GAAGAT GGAGAAGTT GC AGC 1855 

901 T GAAGGGAACAACAT AT GGT GTATGCTCAAAAGCATT CAAATTCGCTAGGACTCCCGCTG 960 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
1856 T GAAGGGAACAACAT AC GGAGT AT GTTCAAAAGC GTT CAAATT C GCT GGGACT CCT GCT G 1915 

961 ACACT GGTC AT GGAAC GGT GGT GCT GGAACT GCAGTAT ACC GGAAAAGAC G GGCCTT GC A 1020 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
1916 ACACT GGCCATGGAACGGT GGT GTT GGAACT GCAGTACACC GGAAC GGAC GGT CCCTGC A 1975 

1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1976 AAGTGCCCATCTCTTCCGTAGCTTCCCTGAATGACCTCACACCTGTTGGAAGACTGGTAA 2035 

1081 CTGT GAAT CCATTTGTGTCTGTGGCTACGGCCAACTCGAAGGTTTT GAT TGAACTCGAAC 1140 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2036 CAGT GAAT CCAT T T GT GT CT GT GGC C AC GGCCAACTC GAAGGTT TT GATT GAACT C GAAC 2095 

1141 C CCC GTTT AGT GACT CTT ACAT C GT GGT GGGGAGAGGAGAACAG CAGATAAAC CAC C ACT 1200 

I I I I I I I II I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2096 CCCCGTTCGGT GACT C CT ACAT C GT GGT GGGAAGAGGAGAAC AGC AGAT AAAC CAT C ACT 2155 

1201 GGC ACAAATCT GGGAGCAGT ATT GGAAAGGCTTT C AC CACT ACACT CAGAGGAGCT CAAC 1260 

I II II I I II I I I I I I I I I I I I I I I I I I I I II II M I I I I I I I I I I I I I I I I I I I 
2156 GGCACAAAT C C GGGAGC AGCAT T GGAAAGGC CTT T ACT ACC AC ACT CAGAGGAGCT CAAC 2215 

1261 GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 1320 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I MINIM I I I I I I I 
2216 GACTTGCAGCTCTTGGAGACACTGCTTGGGATTTTGGATCAGTTGGAGGGGTATTCACCT 2275 



Qy 1321 CGGTAGGGAAAGCCATACACCAAGTTTTTGGAGGAGCCTTTAGATCACTCTTTGGAGGGA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2276 C GGT GGGGAAAGCT AT ACAC CAAGT CTTT GGAGGAGCTTT T AGAT CACTT T TT GGAGGGA 2335 



Qy 1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 2336 TGTCCTGGATCACACAGGGACTTCTGGGAGCTCTTCTGTTGTGGATGGGAATCAATGCCC 2395 

Qy 1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2396 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTTTTGCTCTTCCTTTCGG 2455 

Qy 1501 TCAACGTCCATGCTG 1515 

I I I I I I I I I I I I I I 
Db 2456 TCAACGTCCACGCTG 2470 



RESULT 4 
AY532665 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 



JOURNAL 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



CDS 



AY532665 11038 bp RNA linear VRL 09-DEC-2004 

West Nile virus strain B956 polyprotein gene, complete genome. 
AY532665 

AY532665.1 GI:56462533 

West Nile virus (WNV) 
West Nile virus 

Viruses; ssRNA positive-strand viruses , no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 

1 (bases 1 to 11038) 

Yamshchikov, G. , Borisevich, V. , Seregin,A. , Chaporgina, E. , 

Mishina,M., Mishin,V., Wai Kwok,C. and Yamshchikov, V. 

An attenuated West Nile prototype virus is highly immunogenic and 

protects against the deadly NY99 strain: a candidate for live WN 

vaccine development 

Virology 330 (1), 304-312 (2004) 

15527855 

2 (bases 1 to 11038) 
Borisevich, V. G. and Yamshchikov, V. F. 

Molecular basis of attenuation of the West Nile virus prototype 

strain B956 

Unpublished 

3 (bases 1 to 11038) 
Borisevich, V. G . and Yamshchikov, V. F. 
Direct Submission 

Submitted (23- JAN-2004 ) Molecular Biosciences, University of 
Kansas, 1200 Sunnyside Ave., Lawrence, KS 66045, USA 

Location/Qualifiers 

1. .11038 

/organism="West Nile virus" 
/mol_type=" genomic RNA" 
/strain="B956" 
/db_xref="taxon: 11082" 

/note="obtained from R. Shope, Galveston, TX at suckling 

mouse brain passage 2; passaged once in C6/36 cells" 

97. .10389 

/codon_start=l 

/product="polyprotein" 



/protein_id="AAT02759. 1" 
/db_xref="GI : 56462534" 

/trans la tion="MSKKPGGPGKNRAVNMLKRGMPRGLSLIGLKRAMLSLIDGKGPI 

RFVIiALLAFFRFTAIAPTRAVLDRWRGWKQTAMKHLLSFKKELGTLTSAINRRSTKQ 

KKRGGTAGFTILLGLIACAGAVTLSNFQGKViyDyiTWATDVTDVITIPTAAGKNLCIVR 

AMDVGYLCEDTITYECPVLAAGNDPEDIDCWCTKSSVYVRYGRCTKTRHSRRSRRSLT 

VQTHGESTLANKKGAWLDSTKATRYLVKTESWILRNPGYALVAAVIGWMLGSNTMQRV 

VFAILLLLVAPAYSFNCLGMSNRDFLEGVSGATWVDLVLEGDSCVTLMSKDKPTIDVK 

MMNMEAANLADVRSYCYLASVSDLSTRAACPTMGEAHNEKRADPAFVCKQGVW 

NGCGLFGKGSIDTCAKFACTTKATGWIIQKENIKYEVAIFVHGPTTVESHGKIGATQA 

GRFSITPSAPSYTLKLGEYGEVTVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLN 

LPWSSAGSTTWRNRETLMEFEEPHATKQSVVALGSQEGALHQALAGAI PVEFSSNTVK 

LTSGHLKCRVKMEKLQLKGTTYGVCSKAFKFARTPADTGHGTWLELQYTGTDGPCKV 

PISSVASLNDLTPVGRLVTVNPFVSVATANSKVLIELEPPFGDSYIWGRGEQQINHH 

WHKS GS S I GKAFTTTLRGAQRLAALGDTAWDFGS VGGVFT SVGKAI HQVFGGAFRS LF 

GGMSWITQGLLGALLLWMGINARDRSIAMTFLAVGGVLLFLSVNVHADTGC^ 

ELRCGSGVFIHNDVEAWMDRYKFYPETPQGLAKIIQKAHAEGVCGLRSVSRLEHQMWE 

AI KDELNTLLKENGVDLSVVVEKQNGMYKAAPKRLAATTEKLEMGWKAWGKS 1 1 FAPE 

LANNTFVIDGPETEECPTANRAWNSMEVEDFGFGLTSTRMFLRIRETNTTECDSKIIG 

TAVKNNMAVHSDLSYWIESGLNDTWKLERAVLGEVKSCTWPETHTLWGDGVLESDLII 

PITLAGPRSNHNRRPGYKTQNQGPWDEGRVEIDFDYCPGTTVTISDSCGHRGPAARTT 

TESGKLITDWCCRSCTLPPLRFQTENGCWYGMEIRPTRHDEKTLVQSRVNAYNADMID 

PFQLGLLWFIATQEVLRKRWTAKI SI PAIMLALLVLVFGGITYTDVLRYVILVGAAF 

AEANSGGDWHLALMATFKIQPVFLVAS FLKARWTNQES I LLMLAAAFFQMAYYDAKN 

VLSWEVPDVLNSLSVAWMILRAISFTNTSNVWPLI^LTPGLKCLNLDVYRILLIJytV 

GVG S L I KEK RS S AAKKKGAC L I C LALAS T GVFN PMI LAAGLMAC D PN RKRGW PAT EVM 

TAVGLMFAI VGGLAELDI DSMAI PMT I AGLMFVAFVI S GKSTDMWI ERTADI TWESDA 

EITGSSERVDVRLDDDGNFQLMNDPGAPWKIWMLRMACLAI SAYTPWAILPSVIGFWI 

TLQYTKRGGVLWDTPSPKEYKKGDTTTGWRIMTRGLLGSYQAGAGVMVEGVFHTLWH 

TTKGAALMSGEGRLDPYWGSVKEDRLCYGGPWKLQHKWN 

QTKPGVFKTPEGEIGAVTLDYPTGTSGSPIVDKNGDVIGLYGNGVIMPNGSYISAIVQ 

GERMEEPAPAGFEPEMLRKKQITVLDLHPGAGKTRKILPQIIKEAINKRLRTAVLAPT 

RWAAEMSEALRGLPIRYQTSAVHREHSGNEIVDVMCHATLTHRLMSPHRVPNYNLFI 

MDEAHFTDPASIAARGYIATKVELGEAAAI FMTATPPGTSDPFPESNAPISDMQTEIP 

DRAWNTGYEWITEWGKTWFVPSVKMGNEIALCLQRAGKKVIQLNRKSYETEYPKCK 

NDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIIEEGDGRVILGEPSAITAASAAQ 

RRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLVAQLYQPERE 

KVYTMDGEYRLRGEERKNFLEFLRTADLPWIAYKVAAAGISYHDRKWCFDGPRTNTI 

LEDNNEVEVITKLGERKILRPRWADARVYSDHQALKSFKDFASGKRSQIGLVEVLGRM 

PEHFMGKTWEALDTMYVVATAEKGGRAHRMALEELPDALQTIALIALLSVMSLGVFFL 

LMQRKGI GKIGLGGI I LGAATFFCWMAEVPGTKI AGMLLLSLLLMI VLI PEPEKQRSQ 

TDNQLAVFLICVLTLVGAVAANEMGWLDKTKNDISSLLGHKPEARETTLGVESFLLDL 

RP AT AW S L YAVT T AVLT PLLKHLITSDYINTSLTSI NVQAS AL FT LARG F P FVDVG VS 

ALLLAAGCWGQWLTVTWAAAIjLFCHYAYMVPGWQAEAMRSAQRRTAAGIM 

G I VAT DVP E L E RT T P VMQ K KVGQ 1 1 L I LVSMAAVWN P S VRT VREAG I LT T AAAVT LW 

ENGASSVWNATTAIGLCHIMRGGWLSCLSIMWTLIKNMEKPGLKRGGAKGRTLGEVWK 

ERLNHJyiTKEEFTRYRKEAITEVDRSAAKHARREGNITGGHPVSRGTAKLRWLVERRFL 

EPVGKVVDLGCGRGGWCYYMATQKRVQEVKGYTKGGPGHEEPQLVQSYGWNIVTMKSG 

VDVFYRPSEASDTLLCDI GES S S SAEVEEHRTVRVLEMVEDWLHRGPKEFCI KVLCP Y 

MPKVIEKMETLQRRYGGGLVRNPLSRNSTHEMYWSHASGNIVHSVNMTSQV^ 

KKTWKGPQFEEDVNLGSGTRAVGKPLLNSDTSKIKNRIERLKKEYSSTWHQDANHPYR 

TWNYHGSYEVKPTGSASSLWGVVRLLSKPWDTIT^^VTTMA^^^DTTPFGQQRVFKEK^ 

DTKAPEPPEGVKYVLNETTNWLWAFLARDKKPRMCSREEFIGKVNSNAALGAMFEEQN 

QWKNAREAVEDPKFWEMVDEEREAHLRGECNTCIYNMMGKREKKPGEFGK^ 

FMWLGARFLEFE^GFLNEDHWLGRKNSGGGVEGLGLQKLGYILKEVGTKPGGKVYAD 

DTAGWDTRITKADLENEAKVLELLDGEHRRLARSI I ELTYRHKWKVMRPAADGKTVM 

DVISREDQRGSGQVWYALNTFTNIAVQLVRMMEGEGVIGPDDVEKLGKGKGPKVRTW 



LFENGEERLSRMAVSGDDCWKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGWYDWQ 
QVPFCSNHFTELIMKDGRTLWPCRGQDELIGRARISPGAGWNVRDTACLAKSYAQMW 
LLLYFHRRDLRLMANAICSAVPANWPTGRTTWSIHAKGEWMTTEDMIA 
NEWMEDKTPVERWSDVPYSGKREDIWCGSLIGTRTRATWAENIHVAINQVRSVIGEEK 
YVDYMS S LRRYEDTI WEDTVL " 

ORIGIN 

Query Match 86.9%; Score 1321; DB 10; Length 11038; 

Best Local Similarity 92.6%; Pred. No. 0; 

Matches 1403; Conservative 0; Mismatches 100; Indels 12; Gaps 1; 



Qy 1 C GGAAT T CAGCTT CAACT GTT T AGGAAT GAGCAACAGGGACT TC CT GGAGGGAGTGT CT G 60 

I I II I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 956 CAGC AT AC AGCTT CAACT GCT T AGGAAT GAGT AACAGAGACT T CCT GGAGGGAGTGT CT G 1015 

Qy 61 GAGCT ACAT GGGTT GATCT GGT ACT GGAAGGAGACAGTT GT GT GAC C ATAAT GT CAAAAG 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1016 GAGCT AC AT GGGTT GATCT GGT ACT GGAAGGCGATAGTTGT GTGACCCTAAT GT CAAAAG 1075 

Qy 121 ACAAGC CAACCATT GAT GT CAAAAT GAT GAACAT GGAAGC AGCTAAT CT CGCAGAT GT GC 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 1076 ACAAGC CAACCATT GATGT CAAAAT GAT GAACAT GGAAGCAGCCAACCTCGCAGATGT GC 1135 

Qy 181 GTAGCTACTGCTACTTAGCTTCGGTCAGTGATCTGTCAACAAAAGCCGCGTGTCCAACCA 240 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I II 
Db 1136 GCAGTT ACT GTT AC CT AGCTT C GGT CAGT GACT T GTCAACAAGAGCT GC GT GT C CAAC CA 1195 

Qy 241 TGGGTGAAGCTCACAACGAGAAAAGAGCCGACCCTGCCTTTGTTTGCAAGCAAGGCGTCG 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1196 TGGGTGAAGCCCACAACGAGAAAAGAGCTGACCCCGCCTTCGTTTGCAAGCAAGGCGTTG 1255 

Qy 301 T AGACAGAGGAT GGGGGAAT GGAT GC GGACT GT T T GGAAAGGGGAGC ATT GAC ACATGT G 360 

I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 1256 T GGACAGAGGATGGGGAAAT GGCT GC GGACTGTT T GGAAAGGGGAGC ATT GACACAT GT G 1315 

Qy 361 CAAAGTTT GCCTGTACAACCAAGGCAACTGGTT GGATTATCCAGAAGGAAAACATCAAGT 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1316 CGAAGTTT GCCTGTACAACCAAAGCAACTGGAT GGATCATCCAGAAGGAAAACATCAAGT 1375 

Qy 421 ACGAGGTT GCC AT AT TTGTGCAT GGC CC GACGACT GT CGAAT CACAT GGCAATT ATT CAA 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 1376 AT GAGGT T GC CAT AT T T GT GCAT GGCC C GACGACC GTT GAAT CT CAT GGCA 1426 

Qy 481 CACAGATAGGGGCTACCCAAGCAGGAAGGTTCAGCATAACTCCATCGGCACCATCCTACA 540 

I I I I I I I II I I I I I I II Mill I I I II I I I I I I I I I II I I I I I I I I I II I 
Db 1427 AGATAGGGGCCACCCAGGCT GGAAGATTCAGTATAACTCCAT CGGCGCCAT CTTACA 1483 

Qy 541 CGCTGAAGTTGGGT GAGTATGGT GAGGTCACAGTTGACTGTGAGCCACGGT CAGGAATAG 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 14 84 CGCTAAAGTTGGGTGAGTAT GGT GAGGTTACGGTT GATT GTGAGCCACGGT CAGGAATAG 1543 

Qy 601 ACACTAGCGCTTACTACGTTATGTCAGTGGGTGCGAAGTCCTTCTTGGTTCACCGAGAAT 660 

I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1544 ACACTAGCGCCTATTACGTTATGTCAGTTGGTGCGAAGTCCTTCCTGGTTCACCGAGAAT 1603 

Qy 661 GGTTTATGGACCTGAACCTTCCATGGAGTAGCGCTGGAAGCACAACGTGGAGGAACCGGG 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1604 GGT TT AT GGAT CT GAACCT GCCATGGAGCAGT GCT GGAAGCAC CAC GTGGAGGAACCGGG 1663 



721 



780 



AAACACT GAT GGAGT TT GAAGAACCT C AT GC CAC C AAACAAT CT GT C GT AGCT CT AGGGT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
1664 AAACACT GAT GGAGT TT GAAGAACCT CAT GC CAC CAAACAAT CT GTT GT GGCT CT AGGGT 1723 

781 CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 840 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
1724 CGCAGGAAGGTGCGTTGCACCAAGCTCTGGCCGGAGCGATTCCTGTTGAGTTCTCAAGCA 1783 

841 ACACT GT GAAGTT GACATCAGGACAT CT GAAGT GT AGGGT GAAGAT GGAGAAGT T GC AGC 900 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
1784 ACACT GT GAAGTT GACATCAGGACAT CT GAAGT GT AGGGT GAAGAT GGAGAAGT T GC AGC 1843 

901 TGAAGGGAACAACATATGGTGTATGCTCAAAAGCATTCAAATTCGCTAGGACTCCCGCTG 960 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1844 TGAAGGGAACAACATATGGAGTATGTTCAAAAGCGTTCAAATTCGCTAGGACTCCCGCTG 1903 

961 ACACT GGTCATGGAACGGTGGTGCTGGAACTGCAGTATACCGGAAAAGACGGGCCTTGCA 1020 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
1904 ACACTGGCCACGGAACGGTGGTGTTGGAACTGCAATATACCGGAACAGACGGTCCCTGCA 1963 

1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I 
1964 AAGTGCCCATTTCTTCCGTAGCTTCCCTGAATGACCTCACACCTGTTGGAAGACTGGTGA 2023 

1081 CTGTGAATCCATTTGTGTCTGTGGCTACGGCCAACTCGAAGGTTTTGATTGAACTCGAAC 1140 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2024 CCGTGAATCCATTTGTGTCTGTGGCCACAGCCAACTCGAAGGTTTTGATTGAACTCGAAC 2083 

1141 CCCCGTTTAGT GACT CTTACATCGTGGTGGGGAGAGGAGAACAGCAGATAAACCACCACT 1200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2084 C CCCGTT T GGT GACT CT TACATC GT GGTGGGAAGAGGAGAACAGCAGATAAAC CAT C ACT 2143 

1201 GGCACAAAT CT GGGAGCAGTATT GGAAAGGCTT T CAC CAC T ACACT CAGAGGAGCT CAAC 1260 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
2144 GGCACAAAT CT GGGAGCAGCATT GGAAAGGC CT TT AC CAC CACACTCAGAGGAGCT CAAC 2203 

1261 GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 1320 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2204 GACTCGCAGCTCTTGGAGATACTGCTTGGGATTTTGGATCAGTTGGAGGGGTTTTCACCT 2263 

1321 CGGTAGGGAAAGCCATACACCAAGTTTTTGGAGGAGCCTTTAGATCACTCTTTGGAGGGA 1380 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
2264 C AGTGGG GAAAGCCATACAC CAAGT CTTT GGAGGAGCT T T T AGAT C ACT CTT T GGAGGGA 2323 

1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
2324 TGTCCTGGATCACACAGGGACTTCTGGGAGCTCTTCTGTTGTGGATGGGAATCAATGCCC 2383 

1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I 
2384 GTGACAGGTCAATTGCTATGACGTTTCTTGCGGTTGGAGGAGTTTTGCTCTTCCTTTCGG 2443 

1501 TCAACGTCCATGCTG 1515 

I I I I I I I II I I I I I I 
2444 TCAACGTCCATGCTG 2458 



RESULT 5 
DQ318019 

LOCUS DQ318019 11038 bp mRNA linear VRL 01-JAN-2006 

DEFINITION West Nile virus strain ArD76104, complete genome. 
ACCESSION DQ318019 

VERSION DQ318019.1 GI : 84028432 

KEYWORDS 

SOURCE West Nile virus (WNV) 

ORGANISM West Nile virus 

Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 
REFERENCE 1 (bases 1 to 11038) 

AUTHORS Borisevich, V. G. , Seregin,A.V. and Yamshchikov, V. F. 

TITLE Genetic determinants of West Nile virus pathogenicity 

JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 11038) 

AUTHORS Borisevich, V.G. and Yamshchikov, V. F. 

TITLE Direct Submission 

JOURNAL Submitted ( 07-DEC-2005) Molecular Biosciences, 1200 Sunnyside ave, 
Lawrence, KS 66045, USA 
FEATURES Location/Qualifiers 
source 1.' .11038 

/organism= M West Nile virus" 
/mol_type="mRNA" 
/strain="ArD76104" 
/db_xref="taxon: 11082" 
/ count ry=" Senegal " 

/note="lineage 2; SMB pass 3, C6/36 pass 1" 
5'UTR 1. .96 

CDS 97. .10389 

/codon_start=l 

/product="polyprotein" 

/protein_id=" ABC4 97 1 6.1" 

/db_xref="GI: 84028433" 

/ translation="MSKKPGGPGKNRAVNMLKRGMPRGLSLIGLKRAMLSLIDGKGPI 

RFVLALLAFFRFTAIAPTRAVLDRWRGVNKQTAMKHLLSFKKELGTLTSAINRRSTKQ 

KKRGGTAGFTILLGLIACAGAWLSNFQGKVMMTVNATDVTDVITIPTAAGKNLCIVR 

AMDVG YLCEDT I T YEC PVLAAGND P ED I DCWCT KS S VYVRYGRCT KT RH S RRS RRS LT 

VQTH GE S T LAN KKGAWLD S T KAT RYLVKT E S WI L RN P G YALVAAVI GWML G SNTMQ RV 

VFAILLLLVAPAYSFNCLGMSNRDFLEGVSGATWVDLVLEGDSCVTIMSKDKPTIDVK 

MMNMEAANLADVT^SYCYLASVSDLSTRAACPTMGEAHNEKRADPAFV^ 

NGCGLFGKGSIDTCAKFACTTKATGWIIQKENIKYEVAIFVHGPTTVESHGKIGATQA 

GRFSITPSAPSYTLKLGEYGEVTVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLN 

LPWSSAGSTTWRNRETLVEFEEPHATKQSWALGSQEGALHQALAGAIPVEFSSNTVK 

LTSGHLKCRVKMEKLQLKGTTYGVCSKAFKFARTPADTGHGTWLELQYTGTDGPCKV 

PIS S VAS LNDLT PVGRLVTVNP EVS VATANS KVLI ELEP P FGDS YI WGRGEQQINHH 

WHKSGSSIGKAFTTTLRGAQRLAALGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLF 

GGMSWITQGLLGALLLWMGINARDRSIAMTFLAVGGVLLFLSVTTVHADTGCAIDIGRQ 

ELRCGSGVFIHNDVEAWMDRYKFYPETPQGLAKIIQKAHAEGVCGLRSVSRLEHQMWE 

AIKDELNTLLKENGVTDLSVWEKQNGMYKAAPKRLAATTEKLEMGWKAWGKSIIFAPE 

LANNTFVIDGPETEECPTANRAWNSMEVEDFGFGLTSTRMFLRIRETNTTECDSKIIG 

TAVT^NNMAvliSDLSYWIESGLNDTWKLERAVljGEWSCTWPETHTLWGDGVLESDLII 

PITLAGPRSNHNRRPGYKTQNQGPWDEGRVEIDFDYCPGTTVTISDSCGHRGPAARTT 

TESGKLITDWCCRSCTLPPLRFQTENGCWYGMEIRPTRHDEKTLVQSRVNAYNADMID 

PFQLGLLWFLATQEVXRKRWTAKI SI PAIMLALLVLVFGGITYTDVLRYVI LVGAAF 



AEANSGGDWHLALMATFKIQPVFLVAS FLKARWTNQES I LLMLAAAFFQMAYYDAKN 

VLSWEVPDVLNSLSVAWMILRAISFTNTSNVWPLI^ 

GVGSLIKEKRSSAAKKKGACLICLALASTGVFNPMII^^ 

TAVGLMFAI VGGLAELDI DSMAI PMT I AGLMFVAFVI S GKSTDMWI ERTADI TWES DA 

EITGSSERVDVRLDDDGNFQI>INDPGAPWKIWMLRMACLAI SAYTPWAILPSVIGFWI 

TLQYTKRGGVLWDTPSPKEYKKGDTTTGVYRIMTRGLLGSYQAGAGVMVEGVFHTLWH 

TTKGAALMSGEGRLDPYWGSVKEDRLCYGGPWKLQHKWNGHDEVQMIV^ 

QTKPGVFKTPEGEIGAVTLDYPTGTSGSPIVDKNGDVIGLYGNGVIMPNGSYISAIVQ 

GERMEEPAPAGFEPEMLRKKQITVLDLHPGAGKTRKILPQIIKEAINKRLRTAVLAPT 

RWAAEMSEALRGLPIRYQTSAVHREHSGNEIVDVMCHATLTHRLMSPHRVPNYNLFI 

MDEAHFTDPASIAARGYIATKVELGEAAAIFMTATPPGTSDPFPESNAPISDMQTEIP 

DRAWNTGYEWITEYVGKTVWFVPSVKMGNEIALCLQRAGKKVIQLNRKSYETEYPKCK 

NDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIIEEGDGRVILGEPSAITAASAAQ 

RRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLVAQLYQPERE 

KVYTMDGEYRLRGEERKNFLEFLRTADLPVWLAYKVAAAGISYHDRKWCFDGPRTNTI 

LEDNNEVEVITKLGERKI LRPRWADARVYSDHQALKS FKDFASGKRSQI GLVEVLGRM 

PEHFMGKTWEAIiDTMYAA/ATAEKGGRAHRMALEELPDALQTIALIALLSVMSLGVFFL 

LMQRKGIGKIGLGGVILGAATFFCWMAEVPGTKIAGMLLLSLLLMIVLIPEPEKQRSQ 

TDNQLAVFLICVLTLVSAVAANEMGWLDKTKNDIGSLLGHKPEARETTLGVESFLLDL 

RP AT AW S L YAVT T AVLT PLLKHLITSDYINTSLTSI NVQ AS AL FT LARG F P FVD VGVS 

ALLLAAGCWGQVT LT VTVTAAAL L FCH YAYMVP GWQAEAMRS AQ RRTAAG I MKN AWD 

G I VAT DVP ELE RTT P VMQ KKVGQ IML I LVSMAAVWN P S VRT VREAG I LT TAAAVT LW 

ENGASSVWNATTAIGLCHIMRGGWLSCLSIMWTLIKNMEKPGLKRGGAKGRTLGEVWK 

ERLNHiyiTKEEFTRYRKEAITEVDRSAAKHARREGNITGGHPVSRGTAKLRWLVERRFL 

EPVGKVVDLGCGRGGWCYYMATQKRVQEVKGYTKGGPGHEEPQLVQSYGWNIVTMKSG 

VDVFYRPSEASDTLLCDIGESSSSAEVEEHRTVRVLEMVEDWLHRGPKEFCIKVLCPY 

MPKVIEKMETLQRRYGGGLVRNPLSRNSTHEMYWSHASGNIVHSVNMTSQVLLG^ 

KKTWKGPQFEEDVNLGSGTRAVGKPLLNSDTSKIKNRIERLKKEYSSTWHQDANHPYR 

TWN YHGS YEVKPTGSAS S LVNGWRLLS KPWDT I TNVTTMAMTDTT PFGQQRVFKEKV 

DTKAPEPPEGVKYVLNETTNWLWAFLARD^^ 

QWKNAREAVEDPKFWEMVT)EEREAHLRGECNTCIYN>IMGKREKKPGEFGKAKGSRAIW 
FMWLGARFLEFEALGFLNEDHWLGRKNSGGGVEGLGLQKLGYILKEVGTKPGGKVYAD 
DTAGWDTRITKADLENEAKVLELLDGEHRRLARSIIELTYRHKVVTWMRPAADGKTVM 
DVISREDQRGSGQVVTYALNTFTNLAVQLVRMMEGEGVIGPDDVEKLGKGKGPKVRTW 
LFENGEERLSRMAVSGDDCWKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGWYDWQ 
Q VP FC S NH FT E L I MKDGRT LWP C RGQ DELI G RARI S P GAGWNVRDT AC LAKS YAQMW 
LLLYFHRRDLRLMANAICSAVPVNWVPTGRTTWSIHAK^ 

NEWMEDKTPVERWSDVPYSGKREDIWCGSLIGTRTRATWAENIHVAINQVRSVIGEEK 

YVDYMS S LRRYEDT I WEDTVL " 
mat_peptide 97. .411 

/product="C protein" 
sig_peptide 412. .465 

/note="prM signal peptide" 
mat_peptide 466. .966 

/product="prM protein" 
mat_peptide 466. .741 

/product="cleaved amino terminal prM fragment" 
mat_peptide 742. .966 

/product="M protein" 
sig_peptide 919. .966 

/note="E signal peptide" 
mat_peptide 967. .2457 

/product="E protein" 
sig_peptide 2386. .2457 

/note="NSl signal peptide" 
mat_peptide 2458. .3513 

/product="NSl protein" 



mat_peptide 
mat_peptide 
mat_peptide 
mat_peptide 
sig_peptide 
mat_peptide 
mat_peptide 



3 1 UTR 
ORIGIN 



3514. .4206 

/product="NS2A protein" 
4207. .4599 

/product="NS2B protein" 
4600. .6456 

/product="NS3 protein" 
6457. .6903 

/product="NS4A protein" 
6835. .6903 
/note="2K peptide" 
6904. .7671 

/product="NS4B protein" 
7672. .10386 
/product="NS5 protein" 
10390. .11038 



Query Match 86.9%; Score 1321; DB 10; Length 11038; 

Best Local Similarity 92.6%; Pred. No. 0; 

Matches 1403; Conservative 0; Mismatches 100; Indels 12; Gaps 



l; 



Qy 

Db 

Qy 
Db 



60 



1 CGGAATTCAGCTTCAACTGTTTAGGAATGAGCAACAGGGACTTCCTGGAGGGAGTGTCTG 
I I II I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
956 CAGCAT AC AGCT T CAACTGCTT AGGAAT GAGTAAC AGAGACT T C CT GGAGGGAGT GT CT G 1015 

61 GAGCT ACAT GGGT T GAT CT GGT ACT GGAAGGAGACAGTT GT GT GACCATAAT GT CAAAAG 120 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I 
1016 GAGCT ACAT GGGT T GAT CT GGT ACT GGAAGGCGAT AGTT GT GT GACCATAAT GT CAAAAG 1075 



Qy 121 ACAAGCCAACCATT GAT GTCAAAATGAT GAACATGGAAGCAGCTAATCTCGCAGATGTGC 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 1076 ACAAGCCAACCATT GAT GTCAAAATGAT GAACATGGAAGCAGCCAACCTCGCAGATGTGC 1135 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 



181 GT AGCT ACT GCT ACT T AGCT TC GGT CAGT GAT CTGTCAACAAAAGCCGCGTGTCCAACCA 

I II I I I I I III I II I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I 
1136 GCAGTTACTGTTACCTAGCTTCGGTCAGTGACTTGTCAACAAGAGCTGCGTGTCCAACCA 

241 TGGGTGAAGCTCACAACGAGAAAAGAGCCGACCCTGCCTTTGTTTGCAAGCAAGGCGTCG 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1196 TGGGTGAAGCCCACAACGAGAAAAGAGCTGACCCCGCCTTCGTTTGCAAGCAAGGCGTTG 

301 T AGAC AGAGGAT GGGGGAAT GGAT GC GGACT GTT T GGAAAGGGGAGC AT T GACAC AT GTG 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1256 T GGAC AGAGGAT GGGGAAAT GGCT GC GGACT GTTT GGAAAGGGGAGCAT T GACACAT GTG 

361 CAAAGTTT GCCT GT ACAAC CAAGGCAACT GGT T GGATT AT C C AGAAGGAAAAC AT CAAGT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1316 CGAAGTTTGCCT GT ACAAC CAAAGCAACT GGAT GGAT CAT CCAGAAGGAAAACAT CAAGT 

421 AC GAGGTT GCCAT ATTT GT GCAT GG C CC GACGACT GT C GAAT CAC AT GGCAATT ATT CAA 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I 
1376 AT GAGGTT GCCAT ATTT GT GCAT GGC CC GACGACC GTT GAAT CT CAT GGC A 



481 



240 



1195 



300 



1255 



360 



1315 



420 



1375 



480 



1426 



540 



CACAGATAGGGGCTACCCAAGCAGGAAGGTT CAGCAT AACT CCAT CGGC AC CAT C CT ACA 

I I I I I I I M I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1427 AGAT AGGGGC C ACC CAGGCT GGAAGATT CAGT AT AACT C CAT CGGC GC CAT CTT ACA 14 83 



Qy 

Db 

Qy 

Db 
Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



541 C GCT GAAGTT GGGTGAGT AT GGT GAG GT CACAGTT GACT GT GAGC C AC GGT CAGGAAT AG 600 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1484 C GCT AAAGTT GGGT GAGT AT GGT GAGGTTAC GGTT GAT TGT GAGC C AC GGT CAGGAAT AG 1543 

601 AC ACT AGC GCT TACT AC GTT AT GT C AGT GGGT GCGAAGTC CT T CT T GGTT C ACCGAGAAT 660 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1544 ACACTAGCGCCTATTACGTTATGTCAGTTGGTGCGAAGTCCTTCCTGGTTCACCGAGAAT 1603 



661 



720 



GGTTTATGGACCTGAACCTTCCATGGAGTAGCGCTGGAAGCACAACGTGGAGGAACCGGG 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I 
1604 GGTTTATGGATCTGAACCTGCCATGGAGCAGTGCTGGAAGCACCACGTGGAGGAACCGGG 1663 



721 



780 



AAACACT GAT GGAGT T T GAAGAAC CT C AT GCC ACCAAACAAT CT GT C GT AGCT CT AGGGT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
1664 AAACACT GGT GGAGT T T GAAGAACCT C AT GCCACCAAACAAT CTGTT GT GGCT CT AGGGT 1723 



781 



840 



CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1724 CGCAGGAAGGTGCGTTGCACCAAGCTCTGGCCGGAGCGATTC CTGTT GAGT TCTCAAGCA 1783 



841 



900 



ACACTGT GAAGTT GACATCAGGACATCTGAAGT GT AGGGT GAAGATGGAGAAGTTGCAGC 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I 
1784 AC ACT GT GAAGT T GACAT CAGGAC AT CT GAAGT GT AG GGT GAAGAT GGAGAAGTT GC AGC 1843 



901 



960 



T GAAGGGAACAACATATGGT GT ATGCT CAAAAGCATT CAAATT CGCT AGGACTCCCGCTG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
1844 T GAAGGGAACAACATATGGAGTATGTT CAAAAGCGTT CAAATT CGCT AGGACTCCCGCTG 1903 

961 ACACTGGTCATGGAACGGTGGTGCTGGAACTGCAGTATACCGGAAAAGACGGGCCTTGCA 1020 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
1904 ACACT GGCCAC GGAACGGTGGT GTT GGAACT GCAAT AT AC C GGAAC AGAC GGT CC CT GCA 1963 

1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1964 AAGTGCCCATTTCTTCCGTAGCTTCCCTGAATGACCTCACACCTGTTGGAAGACTGGTGA 2023 

1081 CT GT GAAT C C ATT TGT GT CT GT GGCT AC GGC CAACT C GAAGGT TT T GATT GAACT CGAAC 1140 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2024 C CGT GAAT CC ATTTGT GT CT GT GGC CACAGC CAACT CGAAGGT TT T GATT GAACT CGAAC 2083 

1141 CCCCGTTTAGTGACTCTTACATCGT GGT GGGGAGAGGAGAACAGCAGATAAACCACCACT 1200 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2084 CCCCGTTTGGT GACT CT T ACAT C GT GGT GGGAAGAGGAGAACAGC AGATAAAC CAT CACT 2143 

1201 GGC ACAAAT CT GGGAGC AGT AT T GGAAAGGCT T TCAC CACT AC ACT CAGAGGAGCT CAAC 1260 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
2144 GGC ACAAATCT GGGAGC AGCAT T GGAAAGGC CT TT AC CAC C AC ACT CAGAGGAGCT CAAC 2203 

1261 GACTT GCAGCT CT T G GAGACACT GC CT GGGAT TTT GGATC AGT CGGAGGGGTT TT CACCT 1320 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2204 GACTCGCAGCTCTTGGAGATACTGCTTGGGATTTTGGATCAGTTGGAGGGGTTTTCACCT 2263 

1321 C GGTAGGGAAAGC CAT ACAC CAAGT TTT T GGAGGAGCCTTT AGAT C ACT CT T T GGAGGGA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2264 CAGTGGGGAAAGC CAT AC AC CAAGT CT TT GGAGGAGCTTTT AGAT C ACT CT T T GGAGGGA 2323 

1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 

Db 2324 T GT CCT GGAT C ACACAGGGACT T CT GGGAGCT CT T CT GTT GT GGAT GGGAAT CAAT GC CC 2383 

Qy 1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2384 GTGACAGGTCAATTGCTATGACGTTTCTTGCGGTTGGAGGAGTTTTGCTCTTCCTTTCGG 2443 

Qy 1501 TCAACGTCCATGCTG 1515 

I I I I I I I II I I I I I I 
Db 2444 TCAACGTCCATGCTG 2458 
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WNFCG 10962 bp ss-RNA 

West Nile virus RNA, complete genome. 
M12294 M10103 
M12294.2 GI:11497619 



West Nile virus 
West Nile virus 

Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 

1 (bases 67 to 969) 

Castle, E., Nowak,T., Leidner,U., Wengler,G. and Wengler,G. 
Sequence analysis of the viral core protein and the 

membrane-associated proteins VI and NV2 of the flavivirus West Nile 
virus and of the genome sequence for these proteins 
Virology 145 (2), 227-236 (1985) 
2992152 

2 (bases 859 to 2658) 

Wengler,G., Castle, E., Leidner,U., Nowak,T. and Wengler,G. 

Sequence analysis of the membrane protein V3 of the flavivirus West 

Nile virus and of its gene 

Virology 147 (2)., 264-274 (1985) 

3855247 

3 (bases 1 to 10962) 
Castle, E. 
Unpublished 

4 (bases 67 to 10485) 

Castle, E., Leidner,U., Nowak,T., Wengler,G. and Wengler, G. 
Primary structure of the West Nile flavivirus genome region coding 
for all nonstructural proteins 
Virology 149 (1), 10-26 (1986) 
3753811 

5 (bases 1 to 10962) 

Yamshchikov, V. F. , Wengler, G., Perelygin, A. A. , Brinton,M.A. and 
Compans, R. W. 

An infectious clone of West Nile flavivirus 
Virology (2000) In press 

6 (bases 1 to 10962) 
Castle, E. 

Direct Submission 

Submitted ( 03-AUG-1993) Justus-Liebig-Universitat Giessen, Institut 
fur Virologie, 35392, Giessen, Germany 

7 (bases 1 to 10962) 
Yamshchikov, V. F. 



TITLE Direct Submission 

JOURNAL Submitted ( 01-DEC-2000) University of Virginia Health Sciences 
Centre, Department of Internal Medicine/GI, Charlottesville, VA 
22906 

COMMENT On Dec 1, 2000 this sequence version replaced gi: 336167. 

Draft entry and sequence in computer readable form for 
[1] r [2] , [4] , [3] kindly provided by E. Castle, 12-NOV-1985. The West 
Nile viral genome consists of a 42S viral RNA. The amino-terminal 
ends of the structural proteins were experimentally determined. An 
f atg' codon is located at positions 142-144, which could be used 
for an alternative initiation of translation forV2. The 
carboxy-terminal ends of the proteins reported here were not yet 
precisely defined. 
FEATURES Location/Qualif iers 

source 1. .10962 

/organism="West Nile virus" 
/virion 

/mol_type=" genomic RNA" 
/db_xref="taxon: 11082" 
/clone="33/G8; 34/F6" 
CDS 97. .10389 

/codon_start=l 

/product="polyprotein precursor" 
/protein_id="AAA48498.2" 
/db_xref="GI: 11497620" 

/translation="MSKKPGGPGKNRAVNMLKRGMPRGLSLIGLKRAMLSLIDGKGPI 

rfvxallaffrftaiaptravxdrwrgwkqtamkhllsfkkelgtltsainrrstkq 

kkrggt ag ft i llgl i acagavt l s n fq g kvmmtvn at dvt d vi t i pt aagkn lc i vr 

amdvgylcedtityecpvlaagndpedidcwctkssvyvrygrctktrhsrrsrrslt 

vqthgestlankkgawldstkatrylvkteswilrnpgyalvaavigwmlgsntmqrv 

vfaillllvapaysfnclgmsnrdflegvsgatwvdlvlegdscvtimskdkptidvk 

mmnmeaanladwsycylasvsdlstraacptmgeahnekradpafvck^ 

ngcglfgkgsidtcakfacttkatgwiiqkenikyevaifvhgpttveshgkigatqa 

grfsitpsapsytlklgeygevtvdceprsgidtsayyvmsvgeksflvhrewfmdln 

lpwssagsttwrnretlmefeephatkqswalgsqegalhqalagai pvefssntvk 

ltsghlkcrvto4eklqlkgttygvcskafkfartpadtghgtvvlelqytgtdgpckv 

pissvaslndltpvgrlvtvnpfvsvatanskvlieleppfgdsyiwgrgeqqinhh 

whks gs s i gkafttt lrgaqrlaalgdt awdfgs vggvft s vgkai hqvfggafrs lf 

ggmswitqgllgalllwmginardrs iamtflavggvllflsvnvhadtgcai di grq 

elrcgsgvfihndveawmdrykfypetpqglakiiqkahaegvcglrsvsrlehqmwe 

ai kdelntllkengvdls vwekqngmykaapkrlaatteklemgwkawgksi i fape 

lanntfvidgpeteecptanrawnsmevedfgfgltstrmflriretnttecdskiig 

tavt<nnmavt1sdlsywiesglndtwkleravx 

pitlagprsnhnrrpgyktqnqgpwdegrveidfdycpgttvtisdscehrgpaartt 

tesgklitdwccrsctlpplrfqtengcwygmeirptrhdektlvqsrvnaynadmid 

pfqlglmwflatqevlrkrwtaki si paimlallvxvfggitytdvlryvilvgaaf 

aeansggd whlalmatfkiqpvflvas flkarwtnqes i llmlaaaffqmayydakn 

vlswevpdvxnslsvav^ilraisftntsnvwpllalltpglkclnldwril 

gvgslikekrssaakkkgacliclalastgvfnpmilaaglmacdpnrkrgwpatevm 

tavglmfaivgglaeldidsmai pmtiaglmfaafvisgkstdmwiertaditwesda 

eitgsservtdvt^ldddgnfqlmndpgapwkiwmlrn^claisaytpwailpsvigfwi 

tlqytkrggvlwdtpspkeykkgdtttgvyrimtrgllgsyqagagvmvegvfhtlwh 

ttkgaalmsgegrldpywgsv^edrlcyggpwklqhkwtcghdevqm^ 

qtkpgvfktpegeigavtldyptgtsgspivdkngdviglygngvimpngsyisaivq 

germeepapagfepemlrkkqitv^dlhpgagktrkilpqiikeainkrlrtavlapt 

rvvaaemsealrglpiryqtsavtirehsgneivtdvwcpiatlthrlmsphrvpnynlfi 

mdeahftdpasiaargyiatkvelgeaaai fmtatppgtsdpfpesnapi sdmqtei p 



mat_peptide 
sig_peptide 
mat_peptide 
mat_peptide 

sig_peptide 
mat_peptide 

sig_peptide 
mat_peptide 
mat_peptide 
ORIGIN 



DRAVWTGYEWITEWGKTVWFVPSVKMGNEIALCLQRAGKKVIQLNRKSYETEYPKCK 

NDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIIEEGDGRVILGEPSAITAASAAQ 

RRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLVAQLYQPERE 

KVYTMDGEYRLRGEERKNFLEFLRTADLPVWLAYKVAAAGISYHDRKWCFDGPRTNTI 

LEDNNEVEVITKLGERKILRPRWADARVYSDHQALKSFKDFASGKRSQIGLVEVLGRM 

PEHF1WKTWEALDTMYWATAEKGGRAHRMALEELPDALQTIVLIALLSVMSLGV 

LMQRKGIGKTGLGGVILGAATFFCWMAEVPGTKIAGMLLLSLLLMIVLIPEPEKQRSQ 

TDNQLAVFLICVLTLVGAVAANEMGWLDKTKNDIGSLLGHRPE7VRETTLGVESFLLDL 

RPATAWS L YAVTTAVLT PLLKHL I T S D YI NT S LT S I NVQAS AL FT LARGFP FVDVGVS 

ALLLAVGCWGQVTLTVTVTAAALLFCHYAYMVPG^ 

GI VATDVPELERTT PVMQKKVGQI I LI LVSMAAVWNP SVRTVREAGI LTTAAAVTLW 

ENGASSVWNATTAIGLCHIMRGGWLSCLSIMWTLIKNMEKPGLKRGGAKGRTLGEVWK 

ERLNHMTKEEFTRYRKEAITEVDRSAAKHARREGNITGGHPVSRGTAKLRWLVERRFL 

EPVGKVVDLGCGRGGWCYYMATQKRVQEVKGYTKGGPGHEEPQLVQSYGWNIVTMKSG 

VDVFYRPSEASDTLLCDI GES S S SAEVEEHRTVRVLEMVEDWLHRGPKEFCI KVLCPY 

MPKVIEKMETLQRRYGGGLIRNPLSRNSTHEMYWSHA5GNIVHSVNOTSQVLL 

KKTWKGPQFEEDVNLGSGTRAVGKPLLNSDTSKIKNRIERLKKEYSSTWHQDANHPYR 

TWNYHGSYEVKPTGSASSLVNGVVRLLSKPWDTITNWTMAOTDTTPFGQQRVFKEKV 

DTKAPEPPEGVKYVLNETTNWLWAFLARDKKPRMCSREEFIGKVNSNAALGAMFEEQN 

QWKNAREAV^DPKFWEMVTlEEREAHLRGECNTCIYNM^^ 

FMWLGARFLEFEALGFLNEDHWLGRKNSGGGVEGLGLQKLGYILKEVGTKPGGKVYAD 
DTAGWDTRITKADLENEAKVLELLDGEHRRLARS 1 1 ELT YRHKWKVMRPAADGKTVM 
DVI S REDQRGS GQWT YALNT FTNIJWQLVRMMEGEGVI GPDDVEKLGKGKGPKVRTW 
LFENGEERLSRMAVSGDDCVVKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGWYDWQ 
QVPFCSNHFTELIMKDGRTLWPCRGQDELIGRARISPGAGWNVRDTACLAKSYAQMW 
L L L Y FH RRDL RLMAN AI C S AVP ANWVPT GRT T W S I HAKGEWMT T E DMLAVWN RVW I E E 
NEWMEDKT PVERWS DVP YS GKREDI WCGS LI GTRTRATWAENI HVAINQVRSVI GEEK 
YVDYMS S LRRYEDT I WEDTVL " 
97. .372 

/product="V2 (14kd core protein)" 
409. .465 

/note="NV2 signal peptide" 
466. .765 

/product="NV2 (20.5 kd membrane-associated glycoprotein)" 
742. .765 

/product="Vl (7 kd membrane-associated nonglycosylated 

protein" 

919. .966 

/note="V3 signal peptide" 
967. .2457 

/product="V3 (50 kd membrane-associated glycoprotein; 
putative); putative" 
2386. .2457 

/note="nonstructural protein NV4 signal peptide" 
2458. .6426 

/product="nonstructural protein NV4" 
7834. .10380 

/product="nonstructural protein NV5" 



Query Match 86.8%; Score 1319.4; DB 10; Length 10962; 

Best Local Similarity 92.5%; Pred. No. 0; 

Matches 1402; Conservative 0; Mismatches 101; Indels 12; Gaps 1; 



QY 
Db 



1 CGGAATTCAGCTTCAACTGTTTAGGAATGAGCAACAGGGACTTCCTGGAGGGAGTGTCTG 60 
I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
956 CAGCAT AC AGCTT CAACT GT TT AGGAAT GAGT AAC AGAGACT T CCT GGAGGGAGT GT CTG 1015 



Qy 61 GAGCT ACATGGGTT GAT CT GGT ACT GGAAGGAGAC AGTT GT GT GACC AT AAT GT CAAAAG 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I II I I I I I I 

Db 1016 GAGCT AC ATGGGT T GAT CT GGT ACT GGAAGGC GAT AGTT GT GT GACC ATAATGT CAAAAG 1075 

Qy 121 ACAAGCCAACCATT GAT GT CAAAATGATGAACATGGAAGCAGCTAATCTCGCAGAT GTGC 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I II II I I I I I I I I 

Db 1076 ACAAGCCAAC C ATT GAT GT CAAAAT GAT GAAC AT GGAAG C AGC CAAC CT C GCAGAT GT GC 1135 

Qy 181 GT AGCT ACTGCT ACTT AGCT TC GGT CAGT GAT CT GT CAACAAAAGCC GCGT GT C CAACC A 240 

I II I II I I III I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I 

Db 1136 GCAGTT ACTGTTAC CT AGCT T CGGT CAGT GACTT GTCAACAAGAGCT GCGT GT C CAACCA 1195 

Qy 241 T GGGT GAAGCT C ACAAC GAGAAAAGAGC CGAC CCT GCCTT T GT T T GCAAGCAAGGC GTCG 300 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1196 T GGGT GAAGC C CACAAC GAGAAAAGAGCT GAC C C C GCCTT C GTT T GCAAGCAAGGCGTT G 1255 

Qy 301 T AGAC AGAGGAT GGGGGAAT GGAT GC GGACT GTTT GGAAAGGGGAGC ATT GACACAT GT G 360 

I I I I I I I I I I I I I I I I I I I I I'l I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 1256 T GGACAGAGGATGGGGAAATGGCTGCGGACTGTTT GGAAAGGGGAGCATT GACACAT GTG 1315 

Qy 361 CAAAGTT T GC CTGT ACAAC CAAGGCAACT GGTTGGAT TAT C CAGAAGGAAAAC AT CAAGT 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I 

Db 1316 C GAAGTTT GCCTGT ACAAC CAAAGCAACT GGAT GGAT CAT C CAGAAGGAAAAC AT CAAGT 1375 

Qy 421 AC GAGGTT GCCAT ATTT GT GCAT GGCCC GAC GACT GT CGAATCACAT GGCAAT T ATT CAA 4 80 

I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I II I I I I I I I I I II I 

Db 1376 AT GAGGTT GC CAT AT TT GT GCAT GGCCC GAC GACC GT T GAAT CT CAT GGCA 1426 

Qy 481 C AC AGAT AGGGGCT AC CCAAGC AGGAAGGTT CAGCAT AACT C CAT C GGCAC CAT CCT AC A 540 

I I II II I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1427 AGAT AGGGGC C ACC CAGGCT GGAAGATT CAGT AT AACT C CAT C GGC GC C AT CTT AC A 1483 

Qy 541 CGCT GAAGTT GGGT GAGT AT GGT GAGGT CAC AGTT GACT GT GAGCCACGGT CAGGAATAG 600 

I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1484 C GCTAAAGT T GGGT GAGT AT GGT GAGGT T AC GGTT GAT T GT GAGC CACGGT CAGGAATAG 1543 

Qy 601 ACACTAGCGCTTACTACGTTATGTCAGTGGGTGCGAAGTCCTTCTTGGTTCACCGAGAAT 660 

I I I I Mill II I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I 

Db 1544 ACAC CAGCGC CT ATT AC GTT AT GT CAGT T GGT GAGAAGT C CTT C CT GGT T C ACCGAGAAT 1603 

Qy 661 GGTTT AT GGAC CT GAAC CTT CCAT GGAGT AGC GCT GGAAGCACAACGT GGAGGAAC C GGG 720 

I I I I I I I II I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1604 GGT T TAT GGAT CT GAAC CT GCCAT GGAGCAGT GCT GGAAGCAC CAC GT GGAGGAAC CGGG 1663 

Qy 721 AAACACT GAT GGAGT TT GAAGAAC CT CAT GC CAC CAAACAAT CT GT C GT AGCT CT AGGGT 780 

I II I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I 

Db 1664 AAACACT GAT GGAGT TT GAAGAACCT CAT GCCAC CAAACAAT CT GT T GT GGCT CT AGGGT 1723 

Qy 781 CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 840 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II Mill II I I I I I I I I I I I I I I II I I I I 

Db 1724 C GCAGGAAGGT GCGTT GC AC CAAGCT CT GGC CGGAGC GAT T C CT GT T GAGT T CT CAAGCA 1783 

Qy 841 ACACT GT GAAGTT GAC AT CAGGACAT CT GAAGT GT AGGGT GAAGAT GGAGAAGTTGC AGC 900 

II I I I I II I I I I I II II II II I I II I I I I I I II M I I I I I I I I II I I I I I I II II I I I I 

Db 1784 ACACT GT GAAGT T GAC AT CAGGACAT CT GAAGT GT CGGGT GAAGAT GGAGAAGTT GCAGC 1843 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 
Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 



901 T GAAGG GAACAAC AT AT GGT GT AT GCT CAAAAGCATT CAAAT T CGCT AGGACTC CC GCT G 960 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1844 T GAAGGGAACAACATAT GGAGT AT GT T CAAAAGC GTT CAAAT T C GCT AG GACT C C CGCT G 1903 

961 ACACTGGTCATGGAACGGTGGTGCTGGAACTGCAGTATACCGGAAAAGACGGGCCTTGCA 1020 
I I I II I I II MINIMUM I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
1904 ACACT GG CC AC GGAAC GGT GGT GTT GGAACT GCAATAT AC CGGAACAGAC GGT C CCT GCA 1963 

1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I II I I I II I I I I I II I I I I- II I I II I I I I I I I I I I I I I I I II II I I I I I I I 
1964 AAGTGCCCATTTCTTCCGTAGCTTCCCTGAATGACCTCACACCTGTTGGAAGACTGGTGA 2023 

1081 CTGTGAATCCATTTGTGTCTGTGGCTACGGCCAACTCGAAGGTTTTGATTGAACTCGAAC 1140 

I I I I I I I I I II I I I I I I I I II I I I II I I II I II I I I II I II I I I I I I I I I I I I I II I 
2024 C CGT GAAT CCATT T GT GT CT GTGGCCACAGC CAACTC GAAGGTT TT GAT T GAACT CGAAC 2083 

1141 C C CC GTTT AGT GACT CTT ACATC GT G GT GG GGAGAGGAGAACAGCAGATAAACC AC CACT 1200 

I I I I I I II I I I II I I I II II I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I II 
2084 C CCC GTTT GGT GACT CTT ACATC GT GGT GGGAAGAGGAGAAC AG CAGATAAAC CAT CACT 2143 

1201 GGCACAAATCTGGGAGCAGTATT GGAAAGGCTTTCACCACTACACTCAGAGGAGCT CAAC 1260 

I I II I I I I I II I I II I I I I I I I I I I I I I II II Mill I I II I I I I I I I I I I I I I I I 
2144 GGCACAAAT CT GG GAGCAGCATT GGAAAGGC CT TT AC CACCACACT C AGAGGAG CT CAAC 2203 

1261 GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 1320 

I I I I I I I I I I I I I I I II I Mill I I M I I I I M I II I II I I I I I I I I I II II II I I 

2204 GACTCGCAGCTCTTGGAGATACTGCTTGGGATTTTGGATCAGTTGGAGGGGTTTTCACCT 2263 

1321 CGGTAGGGAAAGCCATACACCAAGTTTTTGGAGGAGCCTTTAGATCACTCTTTGGAGGGA 1380 

I II I I I I I I I I I I I II I I I M M I I I I I M I II I I I I I I I I I I I I I I I I I II II I I 
2264 C AGT GGGGAAAGCCAT ACAC CAAGT CTT T GGAGGAGCTTTT AGAT CACT CT TT GGAGGGA 2323 

1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I I I I I I I I I I I I I I I I M I I I I M I I I I II I I I M I I II I I I I I I I I I I I II II I I 

2324 TGTCCTGGATCACACAGGGACTTCTGGGAGCTCTTCTGTTGTGGATGGGAATCAATGCCC 2383 

1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I II I I I I I I I I I I I I II I II II M I I I I I I M I I I I I I II I I I I I I I I I I I I I J I I I 
2384 GT GACAGGT CAATT GCT AT GACGTT T CT TGC GGT T GGAGGAGTT T T GCT CT T C CTT T CGG 2443 

1501 TCAACGTCCATGCTG 1515 

II I I I I I I I I I I I I I 
2444 TCAACGTCCATGCTG 2458 
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Sequence 
AR365300 
AR365300. 



3 from patent 



1491 bp 
US 5486473. 



DNA 



linear PAT 03-SEP-2003 



GI:34428831 



Unknown. 

Unknown. 

Unclassified. 

1 (bases 1 to 1491) 



Fujita,H., Yoshida,!., Takagi,M., Manabe,S. and Fukai,K. 



TITLE A DNA coding for a Flavivirus antigen 

JOURNAL Patent: US 5486473-A 3 23-JAN-1996; 

The Research Foundation for Microbial Diseases of Osaka University; 

Osaka; 

JPX; 

FEATURES Location/Qualifiers 
source 1. .1491 

/organism="unknown" 
/mol_type=" genomic DNA" 

ORIGIN 

Query Match 86.3%; Score 1312.2; DB 2; Length 1491; 

Best Local Similarity 92.7%; Pred. No. 0; 

Matches 1393; Conservative 0; Mismatches 98; Indels 12; Gaps 1; 

Qy 12 T TCAACT GTTT AGGAAT GAGCAAC AGGGACTT C CTGGAGGGAGT GTCT GGAGCTACAT GG 71 

I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 T TCAACTGTTT AGGAAT GAGTAACAGAGACTT C CTGGAGGGAGT GTCT GGAGCTACAT GG 60 

Qy 72 GTT GAT CT GGT ACT GGAAGGAGAC AGT T GT GT GACC ATAAT GT CAAAAGACAAGC CAACC 131 

I I I I II I I I I I I I I I I I I II II I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 61 GTTGAT CTGGTACT GGAAGGCGAT AGTT GTGTGACCATAATGT CAAAAGACAAGCCAACC 120 

Qy 132 ATTGAT GTCAAAAT GAT GAACAT GGAAGCAGCTAATCTCGCAGATGT GCGTAGCTACT GC 191 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I II I I I I I 
Db 121 ATT GAT GT CAAAAT GAT GAACAT GGAAGCAGC CAAC CT CGCAGAT GT GCGCAGTT ACT GT 180 

Qy 192 TACTTAGCTTCGGTCAGTGATCTGTCAACAAAAGCCGCGTGTCCAACCATGGGTGAAGCT 251 

III I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I II I I I I I I I I I I 
Db 181 TACCTAGCTTCGGTCAGTGACTTGTCAACAAGAGCTGCGTGTCCAACCATGGGTGAAGCC 240 

Qy 252 C ACAAC GAGAAAAGAGC C GACC CT GC CT T T GTTT GCAAGCAAGGC GT CGT AGACAGAGGA 311 

I I II I I I I I I I I I I I I I II I I I I I 1 I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 241 CACAAC GAGAAAAGAGCT GACCC C GC CT T CGTTT GCAAGCAAGGC GTT GT GGACAGAGGA 300 

Qy 312 TGGGGGAATGGATGCGGACTGTTTGGAAAGGGGAGCATTGACACATGTGCAAAGTTTGCC 371 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 TGGGGAAATGGCTGCGGACTGTTTGGAAAGGGGAGCATTGACACATGTGCGAAGTTTGCC 360 

Qy 372 T GT ACAAC CAAGGCAACT GGTT GGAT TAT C C AGAAGGAAAAC AT CAAGTACGAGGTTGCC 431 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 TGTACAACCAAAGCAACTGGAT GGAT CAT CC AGAAGGAAAAC AT CAAGTAT GAGGTTGCC 420 

Qy 432 ATATTT GT GCAT GGCCCGACGACTGT CGAATCACAT GGCAATTATTCAACACAGATAGGG 491 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 421 AT AT TT GT GCAT GGC C C GAC GAC CGT T GAAT CT CAT GGC A AGATAGGG 468 

Qy 492 GCTACC CAAGCAGGAAGGTT CAGCATAACT CCAT CGGC ACC AT C CTACACGCT GAAGTT G 551 

II I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 469 GCCACC CAGGCT GGAAGATT CAGT ATAACT C CAT CGG CGC CAT CT T ACACGCT AAAGT T G 528 

Qy 552 GGT GAGT AT GGT GAGGT C AC AGTT GACT GT GAGC CAC GGT CAGGAAT AGAC ACTAGCGCT 611 

I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 529 GGT GAGT AT GGT GAGGTTAC GGTT GAT T GT GAGC CAC GGT CAGGAAT AGAC AC CAGCGCC 588 



Qy 



612 



T ACTAC GT TAT GTCAGT GGGT GCGAAGT CCT T CT TGGTT CAC C GAGAAT GGTTT AT GGAC 671 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I II 



Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



589 T ATT AC GTT AT GT C AGT T G GT GAGAAGT C CTT C CT GGTT C AC C GAGAAT GGTTT AT GGAT 648 



672 



649 



732 



709 



792 



769 



852 



829 



912 



889 



972 



949 



CT GAAC CT T C C AT GGAGT AGCGCTGGAAGC ACAAC GT GGAGGAAC C GGGAAAC ACT GAT G 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I 
CT GAAC CT GC CAT GGAG CAGTGCTGGAAGCAC CAC GT G GAGGAAC C GGGAAACACT GAT G 



731 



708 



791 



GAGTTTGAAGAACCTCATGCCACCAAACAATCTGTCGTAGCTCTAGGGTCGCAGGAAGGT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
GAGT TT GAAGAAC CTCATGC CAC CAAACAAT CT GTT GT G GCT CT AGGGT C GCAGGAAGGT 768 



GCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCAACACTGTGAAG 

II I I I I I I I I I I II I I I I I Mill I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GCGTTGCACCAAGCTCTGGCCGGAGCGATTCCTGTTGAGTTCTCAAGCAACACTGTGAAG 

T T GACAT C AGGAC AT CT GAAGT GT AGGGT GAAGAT GGAGAAGT T GCAGCT GAAGGGAACA 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TTGACAT CAGGACAT CT GAAGT GTCGGGT GAAGAT GGAGAAGT T GCAGCT GAAGGGAACA 

ACAT AT GGT GT AT GCT CAAAAGC AT T CAAAT T CGCT AGGACT C C CGCT GAC ACT GGT CAT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
ACAT AT G GAGT AT GT T CAAAAGC GTT CAAAT T CGCT AGGACT CC CGCT GAC ACT GGC CAC 



851 



828 



911 



888 



971 



948 



1031 



GGAACGGTGGTGCTGGAACTGCAGTATACCGGAAAAGACGGGCCTTGCAAAGTGCCCATT 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
GGAACGGTGGTGTTGGAACTGCAATATACCGGAACAGACGGTCCCTGCAAAGTGCCCATT 1008 



1032 TCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGACTGTGAATCCA 1091 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
1009 T CTT C C GT AGCTT C C CT GAATGACCT CAC AC CT GTT GGAAGACT GGT GACC GT GAAT CCA 1068 

1092 TTTGTGTCTGTGGCTACGGCCAACTCGAAGGTTTTGATTGAACTCGAACCCCCGTTTAGT 1151 

I I I I I I I II II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
1069 T TT GTGT CT GT GGC C AC AGC CAACT CGAAGGT TT T GATT GAACT C GAAC CCC CGTT T GGT 1128 

1152 GACT CTT ACAT CGT GGT GGGGAGAGGAGAACAGCAGATAAAC C ACC ACT GGC ACAAATCT 1211 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1129 GACTCTTACAT CGT GGTGGGAAGAGGAGAACAGCAGATAAACCATCACTGGCACAAATCT 1188 

1212 GGGAGCAGT AT T GGAAAGGCTT T CAC CACT AC ACT CAGAGGAGCT C AAC GACTT GCAGCT 1271 

I I I I I I I I I I I I I I I I I I I II I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I 
1189 GGGAGCAGCATT GGAAAGGC CT TTAC CAC CAC ACT CAGAGGAGCT CAAC GACT CGCAGCT 1248 

1272 CTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCTCGGTAGGGAAA 1331 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
1249 CTTGGAGATACTGCTTGGGATTTTGGATCAGTTGGAGGGGTTTTCACCTCAGTGGGGAAA 1308 

1332 GC CAT AC AC CAAGTT TTTGGAGGAGC CTTT AGAT CACT CT TTG GAGGGAT GT C CT GGAT C 1391 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I II I I 
1309 GCCATACACCAAGTCTTTGGAGGAGCTTTTAGATCACTCTTTGGAGGGATGTCCTGGATC 1368 

1392 ACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCCGTGACAGGTCA 1451 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I 
1369 ACACAGGGACTTCTGGGAGCTCTTCTGTTGTGGATGGGAATCAATGCCCGTGACAGGTCA 1428 

1452 ATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGGTCAACGTCCAT 1511 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
1429 ATTGCTATGACGTTTCTTGCGGTTGGAGGAGTTTTGCTCTTCCTTTCGGTCAACGTCCAT 1488 



Qy 

Db 



1512 GCT 1514 
I I I 

1489 GCT 1491 
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AF394221 1430 bp mRNA linear VRL 03-MAY-2002 

West Nile virus isolate B956 polyprotein mRNA, envelope 
glycoprotein E and nonstructural protein 1 region, partial cds . 
AF394221 

AF394221.1 GI:20428494 

West Nile virus 
West Nile virus 

Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 

1 (bases 1 to 1430) 

Briese,T., Rambaut,A. , Pathma jeyan, M. , Bishara,J., Weinberger, M. , 
Pitlik,S. and Lipkin,W.I. 

Phylogenetic analysis of a human isolate from the 2000 Israel West 
Nile virus epidemic 

Emerging Infect. Dis. 8 (5), 528-531 (2002) 
11996693 

2 (bases 1 to 1430) 

Briese,T., Jordan, I., Pathma jeyan, M. and Lipkin,W.I. 
Direct Submission 

Submitted (21- JUN-2001 ) Emerging Diseases Laboratory, Microbiology 
& Molecular Genetics, and Neurology, University California Irvine, 
3107 Gillespie Neuroscience Building, Irvine, CA 92697-4292, USA 
Location/Qualifiers 
1. .1430 

/organism="West Nile virus" 
/mol_type="mRNA" 
/isolate="B956" 
/db_xref="taxon: 11082" 
/count ry="Uganda" 

/note="isolated from human serum in 1937; kindly provided 
by Bob Tesh, University of Texas Medical Br., Galveston, 
TX, USA" 
<1. .>1430 
/ codon_s tart=3 
/product= "polyprotein" 
/protein_id="AAM21944 . 1" 
/db_xref="GI: 20428495" 

/ translation="KRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFACTTKATG 
WIIQKENIKYEVAIFVHGPTTVESHGKIGATQAGRFSITPSAPSYTLKLGEYGEVTVD 
CEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLNLPWSSAGSTTWRNRETLVEFEEPHA 
TKQSWALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGTTYGVC 
SKAFKFARTPADTGHGTVVLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNPFVS 
VATANSKVXIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAPTTTLRGAQRLAAL 
GDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDR 
SIAMTFLAVGGVLLFLSVNVHADTGCAIDIGRQELRCGSGVFIHNDVEAWMDRYKFYP 
ETPQGLAKI IQKAHAEGVCGLRSVSR" 
<1. .1244 

/product="envelope glycoprotein E" 



mat_peptide 1245. .>1430 

/product="nonstructural protein 1" 
/note="NSl" 

ORIGIN 

Query Match 71.9%; Score 1093.4; DB 10; Length 1430; 

Best Local Similarity 92.6%; Pred. No. 0; 

Matches 1164; Conservative 0; Mismatches 81; Indels 12; Gaps 1; 

Qy 259 AGAAAAGAGC CGAC C CT GCCT TT GTT TGCAAGCAAGGCGT CGT AGACAGAGGAT GGGGGA 318 

I I I I I I I I I I I I I I I Mill I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I 
Db 1 AGAAAAGAGCTGACCCCGCCTTCGTTTGCAAGCAAGGCGTTGTGGACAGAGGATGGGGAA 60 

Qy 319 AT GGAT GC GGACT GTTT GGAAAGGGGAGCATTGACACAT GT GCAAAGT TT GC CT GT ACAA 378 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 61 ATGGCTGCGGACTGTTTGGAAAGGGGAGCATTGACACATGTGCGAAGTTTGCCTGTACAA 120 

Qy 379 CCAAGGCAACTGGTTGGATTAT CCAGAAGGAAAACAT CAAGTACGAGGTTGCCATATTTG 438 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 121 CCAAAGCAACT GGAT GGAT CAT C CAGAAGGAAAAC AT CAAGT AT GAGGT T G C CAT AT TT G 180 

Qy 439 T GCAT GGC C CGAC GACT GT CGAATCACAT GGCAAT TAT T CAAC AC AGAT AGGGGCT ACC C 498 

I I I I I I II II I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 181 T GCAT GGC C C GAC GACC GTT GAATCT CAT GGCA AGATAGGGGCCACCC 228 

Qy 499 AAGCAGGAAGGTT C AGCATAACT CCAT C GGC AC CAT CCT AC AC GCT GAAGTT GGGT GAGT 558 

I II I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 229 AGGCT GGAAGAT T C AGT AT AACT CCAT C GGC GC CAT CT T AC AC GCTAAAGTT GGGT GAGT 288 

Qy 559 AT GGT GAGGT C ACAGTT GACT GT GAGC C AC GGT CAGGAAT AGACACT AG CGCTT ACT AC G 618 

I I I I I I I I II II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I 
Db 289 AT GGT GAGGTT ACGGTT GATT GT GAGCCACGGT CAGGAAT AGACACT AGC GC CT AT T AC G 348 

Qy 619 TTATGTCAGTGGGTGCGAAGTCCTTCTTGGTTCACCGAGAATGGTTTATGGACCTGAACC 678 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 349 TTATGTCAGTTGGTGCGAAGTCCTTCCTGGTTCACCGAGAATGGTTTATGGATCTGAACC 408 

Qy 679 TT CCATGGAGTAGCGCT GGAAGCACAACGT GGAGGAACCGGGAAACACT GATGGAGTTTG 738 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 09 T GC CAT GGAGCAGT GCT GGAAGC AC CAC GT GGAGGAAC C GGGAAACACT GGT GGAGTTT G 468 

Qy 739 AAGAACCTCATGCCACCAAACAATCTGTCGTAGCTCTAGGGTCGCAGGAAGGTGCCTTGC 798 

I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 469 AAGAACCTCATGCCACCAAACAATCTGTTGTGGCTCTAGGGTCGCAGGAAGGTGCGTTGC 528 

Qy 799 ACCAAGCT CT GGCT GGAGCAATT CCT GTT GAGT T CT CAAGCAACACT GT GAAGT T GAC AT 858 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 529 ACCAAGCT CT GG C C GGAGCGATT C CT GTT GAGT T CT CAAGCAACACT GT GAAGT T GACAT 588 

Qy 859 C AGGACATCT GAAGT GT AGGGT GAAGAT GGAGAAGT T GC AGCT GAAGGGAACAAC AT AT G 918 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 589 CAGGACAT CT GAAGT GT AGGGT GAAGATGGAGAAGT T GC AGCT GAAGGGAACAAC AT AT G 648 

Qy 919 GTGTATGCTCAAAAGCATTCAAATTCGCTAGGACTCCCGCTGACACTGGTCATGGAACGG 978 

I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I II I I 
Db 649 GAGT AT GTT CAAAAGC GTTCAAATT C GCT AGGACT C C CGCT GACACT GGCCACGGAACGG 708 



Qy 97 9 TGGTGCTGGAACTGCAGTATACCGGAAAAGACGGGCCTTGCAAAGTGCCCATTTCTTCTG 

I I I II II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I II I I I I 
Db 709 TGGTGTTGGAACTGCAATATACCGGAACAGACGGTCCCTGCAAAGTGCCCATTTCTTCCG 



1038 
768 



Qy 1039 TGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGACTGTGAATCCATTTGTGT 1098 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I II I I I I I I I 
Db 769 T AGCT T CC CT GAAT GACCT CACAC CT GTT GGAAGACT GGT GACCGT GAAT CC ATTT GT GT 828 

Qy 1099 CTGTGGCTACGGCCAACTCGAAGGTTTTGATTGAACTCGAACCCCCGTTTAGTGACTCTT 1158 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 829 CTGTGGCCACAGCCAACTCGAAGGTTTTGATTGAACTCGAACCCCCGTTTGGTGACTCTT 888 

Qy 1159 AC ATC GT GGT GGGGAGAGGAGAACAGCAGATAAACCACCACT GGCACAAAT CT GGGAGCA 1218 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 889 ACATCGT GGT GGGAAGAGGAGAACAGCAGATAAACCATCACT GGCACAAAT CTGGGAGCA 948 

Qy 1219 GT ATT GGAAAGGCTTT CACC ACT AC ACT CAGAGGAG CT CAAC GACT T GCAGCT CT T GGAG 1278 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 949 GCATTGGAAAGGCCTTTACCACCACACTCAGAGGAGCT CAAC GACT CGCAGCT CTT GGAG 1008 

Qy 127 9 ACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCTCGGTAGGGAAAGCCATAC 1338 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II 
Db 1009 ATACTGCTTGGGATTTTGGATCAGTTGGAGGGGTTTTCACCTCAGTGGGGAAAGCCATAC 1068 

Qy 1339 AC CAAGTTT T T GGAGGAGC CT T T AGAT C ACT CT TT GGAGGGAT GT C CT GGAT CACAC AGG 1398 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1069 ACCAAGTCTTTGGAGGAGCTTTTAGATCACTCTTTGGAGGGATGTCCTGGATCACACAGG 1128 

Qy 1399 GGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCCGTGACAGGTCAATTGCTA 1458 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 1129 GACTT CT GGGAGCT CT T CT GTT GTGGAT GG GAAT CAAT GC CC GTGAC AGGT CAAT T GCTA 1188 

Qy 1459 TGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGGTCAACGTCCATGCTG 1515 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1189 TGACGTTTCTTGCGGTTGGAGGAGTTTTGCTCTTCCTTTCGGTCAACGTCCATGCTG 1245 
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AY701413 10945 bp RNA linear VRL 08-FEB-2005 

West Nile virus strain 04.05 polyprotein gene, complete cds . 
AY701413 

AY701413. 1 GI: 51011375 

West Nile virus (WNV) 
West Nile virus 

Viruses; ssRNA positive-strand viruses , no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 

1 (bases 1 to 10945) 

Schuf fenecker, I . , Peyref itte, C . N . , el Harrak,M. , Murri,S., 

Leblond,A. and Zeller,H.G. 

West Nile virus in Morocco, 2003 

Emerging Infect. Dis. 11 (2), 306-309 (2005) 

15752452 

2 (bases 1 to 10945) 

Schuf f enecker, I . , Murri,S. and Zeller,H.G. 
Direct Submission 



JOURNAL Submitted (29- JUL-2004 ) CNR Arbovirus,' Institut Pasteur, 21 Avenue 
Tony Gamier, Lyon cedex 07 69365, France 
FEATURES Location/Qualifiers 
source 1. .10945 

/organism="West Nile virus" 
/mol_type=" genomic RNA" 
. /strain="04.05" 
/isolation_source="brain of horse with encephalitis" 
/specific_host="horse" 
/db_xref="taxon: 11082" 
/ country="Morocco" 
/collection_date="2003" 
CDS 55. .10356 

/codon_start=l 
/product="polyprotein" 
/protein_id="AAT92099 . 1" 
/db_xref="GI: 51011376" 

/ translation="MSKKPGGPGKSRAWMLKRGMPRVLSLIGLKRAMLSLIDGKGPI 
REVLALIAFFRFTAIAPTRAVLDRWRGWKQTAMKHLLSFKKELGTLTSAINRRSSKQ 
KKRGGKTGIAvlVIIGLIASVGAVTLSNFQGKvNMTW 

AMDVGYMCDDTITYECPVLSAGNDPEDIDCWCTKSAVYVRYGRCTKTRHSRRSRRSLT 
VQT HGE S T LAN KKGAWMD S T KAT RYLVKT E S W I LRN P G YALVAAVI GWMLGS NTMQ RV 
VFVVXLLLVAPAYSFNCLGMSNRDFLEGVSGATWV^LVXEGDSCVTriMSKDKPTIDVK 
MMNMEAANLAEVT^SYCYLATV^ 

NGCGLFGKGS I DTCAKFACSTKATGRTILKENI KYEVAI FVHGPTTVESHGNYSTQIG 

ATQAGRFSITPAAPSYTLKLGEYGEVTVDCEPRSGIDTNAYYVMTVGTKTFLVHREWF 

MDLNLPWS SAGSTVWRNRETLMEFEEPHATKQSVI ALGSQEGALHQALAGAI PVEFS S 

NTVKLTSGHLKCRVKMEKLQLKGTTYGVCSKAFKFLGT PADTGHGTWLELQYTGTDG 

PCKVPISSVASLNDLTPVGRLWWPFVSVATANAKVLIELEPPFGDSYIWGRGEQQ 

INHHWHKSGSSIGKAFTTTLKGAQRLAALGDTAWDFGSVGGVFTSVGKAVHQVFGGAF 

RSLFGGMSWITQGLLGALLLWMGINARDRSIALTFIAVGGVTiLFLSVWHADTGCAID 

ISRQELRCGSGVFIHNDVEAWMDRYKYYPETPQGLAKIIQKAHKEGVCGLRSVSRLEH 

QMWESWDELNTLLKENGVDLSVVA/EKQEGMYKSAPKRLTATTEKLEIGWKAWGKSIL 

FAPELANNTFWDGPETKECPTQNRAWNSLEVEDFGFGLTSTRMFLKVRESNTTECDS 

KI I GTAI KNNLAIHSDLS YWI ES RLNDTWKLERAVLGEVKSCTWPETHTLWGDGI LES 

DLI I PVTLAGPRSNHNRRPGYKTQNQGPWDEGRVEI DFDYCPGTTVTLSESCGHRGPA 

TRTTTESGKLITDWCCRSCTLPPLRYQTDSGCWYGMEIRPQRHDEKTLVQSQVNAYNA 

DMI DP FQLGLLWFLATQEVLRKRWTAKI SMPAI LI ALLVLVFGGIT YTDVLRYVI LV 

GAAFAESNSGGDWHLALMAT FKI QPVFiyiVAS FLKARWTNQENILLMLAAVFFQMAYY 

DARQILLWEIPDVXNSIAVAWMILRAITFTTTSNVWPLLALLTPGLRCLNLDW 

LLMVGIGSLIREKRSAA7UCKKGASLLCLALASTGLFNPMILAAGLIACDPNRKRGWPA 

TEVMTAVGLMFAI VGGLAELDI DSMAI PMT I AGLMFAAFVI SGKSTDMWI ERTADI SW 

ES DAE I T GS S ERVDVRLDDDGN FQLMND P GAPWKI WMLRMAC LAI S AYT PWAI L P S W 

GFWITLQYTKRGGVXWDTPSPKEYKKGDTTTGWRIMTRGLLGSYQAGAGVMVEGVXH 

TLWHTTKGAALMSGEGRLDPYWGSWEDRLCYGGPWKLQHKWNGQDEVQMIVVEPGKN 

VKNVQTKPGVFKTPEGEIGAVTLDFPTGTSGSPIVDKNGDVIGLYGNGVIMPNGSYIS 

AIVQGERMDEPIPAGFEPEMLRKKQITVLDLHPGAGKTRRILPQIIKEAINRRLRTAV 

LAPTRWAAEMAEALRGLPIRYQTSAVTREHNGNEIVDVMCHATLTHRLMSPHRVPNY 

NLFVMDEAHFTDPASIAARGYISTKVELGEAAAI FMTATPPGTSDPFPESNSPISDLQ 

TEIPDRAWNSGYEWITEYIGKTWEVPSVKMGNEIALCLQRAGKKVVQLNRKSYETEY 

PKCKNDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIITEGEGRVILGEPSAVTAA 

SAAQRRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLIAQFYQ 

PEREKVTTMDGEYRLRGEERKNFLELLRTADLPVWLAYKVAAAGVSYHDRRWCFDGPR 

TNTILEDNNEVEVITKLGERKILRPRWIDARVYSDHQALKAFKDFASGKRSQIGLIEV 

LGKMPEHFMGKTWEALDTMYWATAEKGGRAHRMALEE 

VFFLLMQRKGIGKIGLGGVAALGVATFFCWMAEVPGTKIAGMLLLSLLmiVLIPEPEK 
QRSQTDNQLAVFLICVMTLVSAVAANEMGWLDKTKSDISSLFGQRIEVKENFSMGEFL 



LDLRPATAWSLYAVTTAVLTPLLKHLITSDYINTSLTSINVQASALFTLARGFPFVDV 

GVSALLLAAGCWGQWLTVTWAATLLFCHYAYMVPGWQAEAMRSAQRRTAAGIM 

WDG I VAT D VP E LE RT T P I MQ KKVGQ I ML I L VS LAAVWN P S VKT VREAG I L I T AAAV 

TLWENGASSVWNATTAIGLCHIMRGGWLSCLSITWTLIKNMDKPGLKRGGAKGRTLGE 

VWKERLNQMTKEEFTRYRKEAIIEVDRSAAKHARKEGNVTGGHPVSRGTAKLRWLVER 

RFLEPVGKVIDLGCGRGGWCYYMATQKRVQEVRGYTKGGPGHEEPQLVQSYGWNIVTM 

KSGVDVFYRPSECCDTLLCDIGESSSSAEVEEHRTIRVLEMVEDWLHRGPREFCVKVL 

CPYMPKVIEKMELLQRRYGGGLVRNPLSRNSTHEMYWVS RASGNWHSVNMTSQVLLG 

RMEKRTWKGPQYEEDVNLGSGTRAVGKPLLNSDTSKIKNRIERLRREYSSTWHHDENH 

P YRTWN YHGS YDVKPTGS AS SLVNGWRLLS KPWDT I TNVTTMAMTDTT P FGQQRVFK 

EKVDTKAPEPPEGVKYVLNETTNWLWAFLAREKRPRMCSREEFIRKWSNAALGA 

EQNQWRSAREAVEDPKFWEMVDEEREAHLRGECHTCIYNMMGKREKKPGEFGKAKGSR 

AIWFMWLGARFLEFEALGFLNEDHWLGRKNSGGGVEGLGLQKLGYILREVGTRPGGKI 

YADDTAGWDTRITRADLENEAKVLELLDGEHRRLARAI I ELT YRHKWKVMRPAADGR 

TVMDVI S REDQRGS GQWT YALNT FTNLAVQLVRMMEGEGVI GPDDVEKLTKGKGPKV 

RTWLFENGEERLSRMAVSGDDCWKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGWY 

DWQQVPFCSNHFTELIMKDGRTLWPCRGQDELVGRARISPGAGWNVRDTACLAKSYA 

Q1WLLLYFHRRDLRLMANAICSAVPVNWPTGRTTWSIHAGGEWMTTEDMLEVWNRW 

I EENEWMEDKT PVEKWSDVP YS GKREDI WCGS LI GTRARATWAENI QVAINQVRAI I G 

DEKYVDYMSSLKRYEDTTLVEDTVL" 

ORIGIN 

Query Match 66.0%; Score 1003; DB 10; Length 10945; 

Best Local Similarity 78.9%; Pred. No. 0; 

Matches 1195; Conservative 0; Mismatches 320; Indels 0; Gaps 0; 

1 C GGAAT T CAGCTT CAACT GT TTAGGAAT GAGCAACAGGGACT TC CT GGAGGGAGTGTCT G 60 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
914 CAGCCTACAGCTT CAACT GC CTT GGAAT GAGCAACAGAGACT TCT T GGAGGGAGTAT CT G 973 

61 GAGCT ACAT GGGT T GAT CT GGT ACT GGAAGGAGACAGTT GT GT GACCATAAT GT CAAAAG 120 
I I I I II I I I I I I III I I I I II I I I I I Mill II I I II I II I I I I I II I 
974 GAGCAACAT GGGT GGATTT GGTT CT C GAAGGCGAC AGCT GCGT GACT AT CAT GT CCAAGG 1033 

121 ACAAGC CAACC AT T GAT GT CAAAAT GAT GAACAT GGAAGC AGCTAAT CT C GCAGAT GT GC 180 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II II II II I I I I I II I 
1034 ACAAGCCCACCATTGATGTGAAGATGATGAAT AT GGAGGCT GCCAACCT GGCAGAGGTCC 1093 

181 GT AGCT ACT GCT ACTT AGCTT CGGT CAGTGAT CT GTCAACAAAAGC C GC GT GT C CAAC CA 240 

I II II I I I I I II III I Mill II I I I II II Mill II II II I I II 
1094 GCAGTTATTGCTATTTGGCTACCGTCAGCGATCTCTCCACCAAAGCTGCATGCCCGACCA 1153 

241 T GGGT GAAGCT C ACAAC GAGAAAAGAGC CGACCCT GC CT T T GTT T GCAAGCAAGGCGT C G 300 
I I I I I I I I I I I I II I I II III I II II I II II Mill MM II I II II I 
1154 T GGGAGAAGCT C ACAAC GACAAAC GT GCT GACCCGGCT T T T GTGT GC AGACAAGGAGT GG 1213 

301 T AGACAGAGGAT GGGG GAAT GGATGC GGACTGT T T GGAAAGGGGAGC ATT GAC ACAT GT G 360 
I I II II I II I II II II II I I I I II II II II I II II II I II I II I II II I I 
1214 T GGACAGAGGCT GGGG CAAC GGCTGC GGACT AT TT GGCAAAGGAAGC AT T GAC ACAT GC G 1273 

361 CAAAGTTT GCCT GT ACAAC CAAGGCAACTGGT T GGATT AT C C AGAAGGAAAACAT CAAGT 420 
I I I II II II II I II II II I I I I I I I M ill II I I I II I II I II I I 
1274 CCAAATTT GCCT GCTC CAC CAAGGCAACAGGAAGAACCAT CTT GAAAGAGAACAT CAAGT 1333 

421 AC GAGGTT GCC AT ATT TGT GCAT GGCC C GAC GACT GT C GAAT CAC AT GGCAATT ATT CAA 480 
I II II II I II II II I II II I II II Mill II II I I I II II II II I 
1334 AT GAAGT GGCC AT CTTTGT CC AT GGAC CAACCACT GT GGAGT CGC AT GGAAACT ACT C C A 1393 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



Qy 481 CAC AGAT AGGGGCT AC CCAAGC AGGAAGGTT CAGCATAACT C CAT C GGC AC CAT C CT ACA 540 

I II I I I I I II I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I 
Db 1394 CACAGATTGGGGCCACTCAGGCAGGGAGATTCAGCATCACTCCTGCGGCGCCTTCATACA 1453 

Qy 541 CGCTGAAGTT GGGTGAGTATGGT GAGGTCACAGTTGACT GT GAGCCACGGTCAGGAAT AG 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 1454 CACTAAAGCT T GGAGAAT ATGGAGAAGT GAC AGT GGACT GT GAAC CACGGT C AG GGAT T G 1513 

Qy 601 ACACT AGC GCTT ACT AC GT TAT GT C AGT GGGT GC GAAGT CCTTCT T GGTT C ACC GAGAAT 660 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1514 ACACCAATGCCTACTACGTGATGACTGTTGGAACAAAGACGTTTTTGGTCCATCGTGAGT 1573 

Qy 661 GGT TT ATGGACCT GAACCT T C CAT GGAGT AGC GCT GGAAGC ACAACGT GGAGGAAC C GGG 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 1574 GGTTCATGGACCTCAACCTCCCTTGGAGCAGTGCTGGAAGTACTGTGTGGAGGAATAGAG 1633 

Qy 721 AAACACT GAT GGAGTTTGAAGAACCT C AT GCC AC CAAACAAT CT GT CGT AGCTCT AGGGT 780 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1634 AGACGTTAAT GGAGTTTGAGGAACC AC AC GCCACAAAGCAGT CT GT GAT AGC ACT GGGCT 1693 

Qy 781 CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 840 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db ' 1694 CACAAGAGGGAGCT CT GC AT CAAGCT TT GGCT GGAGCC AT C C CT GT GGAAT T TT CAAGCA 1753 

Qy 841 ACACT GT GAAGTT GACAT CAGGACAT CT GAAGT GT AGGGT GAAGAT GGAGAAGT T GC AGC 900 

I I I I I I I I I I I I I I I II II I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I 
Db 1754 ACACT GT TAAGT T GACGT C GGGT CAT CT GAAGT GT AGAGT GAAGAT GGAAAAAT T GCAGT 1813 

Qy 901 T GAAGGGAACAAC AT ATGGT GT AT GCT CAAAAGCAT T CAAAT TC GCT AGGACT C C C GCT G 960 

I I I I I I I I I I I I I II I I II II I I I I I II I I I I I II I I I I I I I I I I I I 
Db 1814 TGAAGGGAACAACCTACGGCGTCTGTTCAAAGGCTTTCAAGTTTCTTGGGACTCCCGCAG 1873 

Qy 961 ACACT GGT CATGGAAC GGT GGT GCT GGAACT GCAGT AT ACC GGAAAAGACGGGC CTT GCA 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I 
Db 1874 AC ACAGGT C ACGGCACTGT GGT GT T GGAATT GCAGT AC ACT GGCAC GGATGGAC CTT GC A 1933 

Qy 1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 1934 AAGTT C CCAT CT CGT CAGT GGCT T C CTT GAAC GACCT AACACCGGT GGGCAGAT T GGT CA 1993 

Qy 1081 CTGTGAATCCATTTGTGTCTGTGGCTACGGCCAACTCGAAGGTTTTGATTGAACTCGAAC 114 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1994 CT GTTAAC CCTTT T GTTT CAGT GGCCAC GGC CAAT GCCAAGGTC CT GAT T GAACT GGAAC 2053 

Qy 1141 CC C CGTT T AGT GACT CTT AC AT C GT GGT GGGGAGAGGAGAAC AGC AGAT AAAC CAC CAC T 1200 

I II III I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I 
Db 2054 C ACCTTT TGGAGACT CAT ACAT AGT GGT AGGCAGAGGAGAACAAC AGAT CAAT CAC CAT T 2113 

Qy 1201 GGCACAAAT CTGGGAGCAGTATTGGAAAGGCTTTCACCACT ACACT CAGAGGAGCT CAAC 1260 

I I I I II Mill I I I I I II II II II II II II II I I II II I II II 

Db 2114 GGCATAAGTCTGGAAGCAGCATCGGCAAAGCCTTTACAACCACTCTCAAAGGGGCGCAGA 2173 

Qy 1261 GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 1320 

II I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

Db 2174 GATTAGCCGCTCTAGGAGACACAGCTTGGGACTTTGGATCAGTTGGAGGGGTGTTCACCT 2233 



Qy 

Db 



1321 
2234 



CGGTAGGGAAAGC CAT ACAC CAAGT TTT T GGAGGAGCCT T T AGAT CACT CTTT GGAGGGA 1380 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CAGTAGGGAAGGCTGTCCATCAAGTGTTCGGTGGAGCGTTCCGCTCACTGTTTGGAGGTA 2293 



Qy 1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2294 TGTCCTGGATAACGCAGGGATTGCTGGGGGCTCTTCTGTTGTGGATGGGCATCAATGCTC 2353 

Qy 1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2354 GTGACAGGTCCATAGCTCTCACGTTTCTCGCAGTTGGAGGAGTTCTGCTCTTCCTCTCCG 2413 

Qy 1501 TCAACGTCCATGCTG 1515 

I I I II I I I I I I I 
Db 2414 TGAACGTGCACGCTG 2428 
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AY262283 10984 bp RNA linear VRL 29-OCT-2003 

West Nile virus isolate KN3829 polyprotein gene, complete cds . 
AY262283 

AY262283.1 GI : 30230630 

West Nile virus (WNV) 
West Nile virus 

Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 

1 (bases 1 to 10984) 

Charrel, R.N. , Brault,A.C, Gallian,P., Lemasson, J. - J. , Murgue,B., 
Murri,S., Pastorino, B . , Zeller,H., de Chesse,R., de Micco,P. and de 
Lamballerie, X . 

Evolutionary relationship between Old World West Nile virus 
strains. Evidence for viral gene flow between africa, the middle 
east, and europe 

Virology 315 (2), 381-388 (2003) 
14585341 

2 (bases 1 to 10984) 
Brault,A.C. and de Lamballerie, X . 
Direct Submission 

Submitted (25-MAR-2003) Division of Vector-Borne Infectious 
Diseases, Centers for Disease Control and Prevention, P.O. Box 
2087, Fort Collins, CO 80522, USA 

Location/Qualifiers 

1. .10984 

/organism="West Nile virus" 
/mol_type=" genomic RNA" 
/isolate="KN3829" 

/specif ic_host="Culex univittatus" 
/db_xref="taxon: 11082" 
1. .60 
61. .10362 
/codon_start=l 
/product="polyprotein" 
/protein_id="AAP20887 . 1" 
/db_xref="GI : 30230631" 

/ trans la tion="MSKKPGGPGKSRAVNMLKRGMPRVLSLIGLKRAMLSLIDGKGPI 



RFVLALLAFFRFTAIAPTRAVLDRWRGVNKQTAMKHLLSFKKELGTLTSAINRRSSKQ 
KKRGGNTGI AAMI GLI AS VGAVTLSN FQGKVMMTVNATDVTDVT T I PTAAGKNLCI VR 
AMDVGYMCDDTITYECPYLSAGNDPEDIDCWCTKSAVYVRYGRCTKTRHSRRSRRSLT 
VQTHGESTLANKKGAWMDSTKATRYLVXTESWILRNPGYALVAAVIGWMLGSNTMQRV 
VFWLLLLVAPAYSFNCLGMSNRDFLEGVSGATWVDLVLEGDSCVTIMSKDKPTIDVK 
MMNMEAANLAEWSYCYLATVSDLSTKAACPTMGEAH^ 

NGCGLFGKGSIDTCAKFACSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYSTQIG 
ATQAGRFS I T P AAP S YT LKLGE YGEVT VDCE P RS GI DTNAY YVMTVGT KT FLVHREW F 
MDLNLPWSSAGSTVWRNRETLMEFEEPHATKQSVIALGSQEGALHQALAGAI PVEFSS 
NTVKLTSGHLKCRVKMEKLQLKGTTYGVCSKAFKFLGTPADTGHGTVVLELQYTGTDG 
PCKVPISSVASLNDLTPVGRLVTVNPFVSVATANAKVLIELEPPFGDSYIWGRGEQQ 
INHHWHKSGSSIGKAFTTTLKGAQRLAALGDTAWDFGSVGGVFTSVGKAVHQVFGGAF 
RSLFGGMSWITQGLLGALLLWMGINARDRSIALTFLAVGGVLLFLSVNVHADTGCAID 
I S RQELRCGSGVFI HNDVEAWMDRYKYYPET PQGLAKI I QKAHKEGVCGLRS VS RLEH 
QMWESVKDELNTLLKENGVDLSVVVEKQEGMYKSAPKRLTATTEKLEI GWKAWGKS I L 
FAPELANNTFWDGPETKECPTQNRAWNSLEVEDFGFGLTSTRMFLKVRESNTTECDS 
KIIGTAVKNNLAIHSDLSYWIESRLNDTWKLERAVLGEVKSCTWPETHTLWGDGILES 
DLI I PVTLAGPRSNHNRRPGYKTQNQGPWDEGRVEIDFDYCPGTTVTLSESCGHRGPA 
TRTTTESGKLITDWCCRSCTLPPLRYQTDSGCWYGMEIRPQRHDEKTLVQSQVNAYNA 
DMIDPFQLGLLWFIATQEVLRKRWTAKISMPAILIALLVLVFGGITYADVLRYVILV 
GAAFAE SN S GGDWH LALMAT FK I QP VFMYAS FLKARWTNQ EN I L LMLAAVFFQMAYH 
DARQILLWEIPDVLNSLAVAWMILRAITFTTTS^mAAPLLALLTPGLRCLNLDVYRIL 
LLMVGIGSLIREKRSAAAKKKGASLLCLALASTGLFNPMILAAGLIACDPNRKRGWPA 
TEVMTAVGLMFAIVGGLAELDIDSMAI PMTI AGLMFAAFVI SGKSTDMWI ERTADI SW 
ESDAEITGSSERVDVRLDDDGNFQLMNDPGAPWKIWMLRMACLAISAYTPWAILPSW 
GFWITLQYTKRGGVLWDTPSPKEYKKGDTTTGVYRIMTRGLLGSYQAGAGVMVEGVFH 
TLWHTTKGAALMSGEGRLDPYWGSVKEDRLCYGGPWKLQHKWNGQDEVQMIVVEPGKN 
VKNVQTKPGVFKTPEGEI GAVTLDFPTGTSGS PI VDKNGDVI GLYGNGVIMPNGS YI S 
AIVQGERMDEPIPAGFEPEMLRKKQITVLDLHPGAGKTRRILPQIIKEAINRRLRTAV 
IAPTRWAAEMAEALRGLPIRYQTSAVTREHNGNEIVDVMCHATLTHRLMSPHRVPNY 
NLFVMDEAHFTDPAS IAARGYI STKVELGEAAAI FMTATPPGTSDPFPESNS PI SDLQ 
TEIPDRAWNSGYEWITEYIGKTVWFVPSVKMGNEIALCLQRAGKKWQLNRKSYETEY 
PKCKNDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIITEGEGRVILGEPSAVTAA 
SAAQRRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLIAQFYQ 
PEREKVYTMDGEYRLRGEERKNFLELLRTADLPVWLAYKVAAAGVSYHDRRWCFDGPR 
TNT I LEDNNEVEVITKLGERKI LRPRWI DARVYS DHQALKAFKDFAS GKRSQI GLI EV 
LGKMPEHFMGKTWEALDTMYWATAEKGGRAHRMALEELPD^ 

VFFLLMQRKGIGKIGLGGVVLGVATFFCWMAEVPGTKIAGMLLLSLLLMIVLIPEPEK 

QRSQTDNQLAVFLICVMTLVSAVAANEMGWLDKTKSDISSLFGQRIEVKENFSMGEFL 

LDLRPATAWSLYAVTTAVLTPLLKHLITSDYINTSLTSINVQASALFTLARGFPFVDV 

GVS ALL LAAGCWGQVT LT VTVT AAT L L FC H YAYMVP GWQAEAMRS AQRRTAAG I MKN A 

WDGIVATDVPELERTTPIMQKKVGQIMLILVSLAAVWNPSVKTVREAGILITAAAV 

TLWENGASSVWNATTAIGLCHIMRGGWLSCLSITWTLIKNMDKPGLKRGGAKGRTLGE 

VWKERLNQMTKEEFTRYRKEAI I EVDRSAAKHARKEGNVTGGHPVSRGTAKLRWLVER 

RFLEPVGKVIDLGCGRGGWCYYMATQKRVQEVRGYTKGGPGHEEPQLVQSYGWNIVTM 

KSGVDVFYRPSECCDTLLCDIGESSSSAEVEEHRTIRVLEMVEDWLHRGPREFCVKVL 

CPYMPKVIEKMELLQRRYGGGLVRNPLSRNSTHEMYWSRASGNVVHSVN1OTSQVLLG 

RMEKRTWKGPQYEEDVNLGSGTRAVGKPLLNSDTSKIKNRIERLRREYSSTWHHDENH 

PYRTWNYHGSYDVKPTGSASSLWGWRLLSKPWDTITNVTTMAMTDTTPFGQQRVFK 

EKVT)TKAPEPPEGVKYVIjNETTNWLWAFIAREKRPRMCSREEFIRKWSNAALGAM 

EQNQWRSAREAVEDPKFWEMVDEEREAHLRGECHTCIYNMMGKREKKPGEFGKAKGSR 

AIWFMWLGARFLEFEALGFLNEDHWLGRKNSGGGVEGLGLQKLGYILREVGTRPGGKI 

YADDTAGWDTRITRADLENEAI^ELLDGEHRRIARAIIELTYRHKVVKVM 

TVMDVISREDQRGSGQVWYALNTFTNLAVQLVRMMEGEGVIGPDDVEKLTKGKGPKV 

RTWLFENGEERLSRMAVSGDDCVVKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGWY 

DWQQVP FCSNHFTELIMKDGRT L WP C RGQD E LVG RARI S P GAGWNVRDT AC LAKS YA 

QMWLLLYFHRRDLRLMANA1CSAVPVNWPTGRTTWSIHAGGEWMTTEDMLEVWNRW 





IEENEWMEDKTPVEKWSDVPYSGKREDIWCGSLIGTRARATWAENIQVAINQVRAIIG 




DEKYVDYMSSLKRYEDTTLVEDTVL" 


mat_peptide 


61. .429 




/product="capsid" 


mat_peptide 


430. .705 




/product="prM" 


mat_peptide 


706. .930 




/product="M" 


mat_peptide 


931. .2433 




/product="envelope ,, 


mat_peptide 


2434. .3492 




/product="NSl M 


mat_peptide 


3493. .4182 




/product="NS2A M 


mat_peptide 


4183. .4575 




/product="NS2B" 


mat peptide 


4576. .6432 




/product="NS3" 


mat peptide 


6433. .6879 




/product="NS4A" 


mat_peptide 


6880. .7644 




/product="NS4B" 


mat_peptide 


7645. .10359 




/product="NS5" 


3 1 UTR 


10363. .10984 



ORIGIN 

Query Match 65.8%; Score 999.8; 

Best Local Similarity 78.7%; Pred. No. 0; 

Matches 1193; Conservative 0; Mismatches 



DB 10; Length 10984; 
322; Indels 0; Gaps 



0; 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



1 CGGAATT CAGCTT CAACT GT TTAGGAAT GAGCAACAGGGACT T C CT GGAGGGAGT GT CT G 60 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I MINI I I I I I I I I I I I I I I 
920 CAGCCTACAGCTTCAACTGT CTT GGAAT GAGCAACAGAGACTT CTT GGAGGGAGT ATCTG 979 

61 GAGCTACATGGGTT GATCTGGTACTGGAAGGAGACAGTT GTGTGACCATAAT GTCAAAAG 120 

I I I I I I I I I I I I III I I I I II I I I II I II I I II I I I I I II I I II I II I 
980 GAGCAACAT GGGT GGATTT GGT T CT CGAAGGCGACAGCT GC GT GACT AT CAT GT CTAAGG 1039 

121 ACAAGCCAAC CATT GAT GT CAAAAT GAT GAAC AT GGAAGCAGCT AATCT CGC AGAT GT GC 180 

I I I I I I I I I I II I I I I I I II I I II I I I I I I I I I II II II II I I I I I II I 
1040 ACAAGCCTACCATT GAT GTGAAGAT GATGAATATGGAGGCTGCCAACCTGGCAGAGGTCC 1099 

181 GTAGCTACTGCTACTTAGCTTCGGTCAGTGATCTGTCAACAAAAGCCGCGTGTCCAACCA 240 

I II II Mill II III I I I I I I I I I I I II II I I I I I II II M I I I I 
1100 GCAGT T ATTGCTATTT GGCT AC C GT CAGC GAT CT CT CCAC CAAAGCT GCAT GCC C GAC C A 1159 

241 TGGGTGAAGCTCACAACGAGAAAAGAGCCGACCCTGCCTTTGTTTGCAAGCAAGGCGTCG 300 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
1160 TGGGAGAAGCT CACAAT GACAAACGT GCT GAC CC AGCT TTT GT GT GCAGACAAGGAGT GG 1219 



301 



360 



TAGACAGAGGATGGGGGAATGGAT GCGGACT GTTT GGAAAGGGGAGCATTGACACAT GTG 
I I I I I I II I I I I I II II I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I 
1220 TGGACAGGGGCTGGGGCAACGGCTGCGGACTATTTGGCAAAGGAAGCATTGACACATGCG 1279 



361 CAAAGTT T GCCT GT ACAACCAAGGCAACT GGT T G GATT AT CCAGAAGGAAAACAT CAAGT 
I I I I I I I I I I I 11111111111111 II II I I I I I I II I I I I I I I 



420 



Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1280 CCAAATTT GCCT GCTCCACCAAGGCAACAGGAAGAACCATCTTGAAAGAGAATATCAAGT 1339 



421 



AC GAGGT T GC CAT AT TT GTGCAT GGCC C GAC GACT GT CGAAT C ACAT GGCAATT ATT CAA 

I II II I I I I I I I I I I I I I I I II II I I I I I II II I I I I I II II II I 
1340 AT GAAGT GGC CAT CT T T GTC CAT GGAC CAACC ACT GT GGAGT C GC AT GGAAACT ACT CCA 

481 C AC AGAT AGGG GCT AC CCAAGCAGGAAGGT T C AGC ATAACT C CAT C GGC AC CAT C CT ACA 
I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I II I I I I I I II II I I I I 
1400 CACAGATTGGGGCCACTCAGGCAGGGAGATTCAGCATCACTCCTGCGGCGCCTTCATACA 



541 



480 



1399 



540 



1459 



600 



C GCT GAAGTT GGGTGAGTAT GGT GAGGT CAC AGT T GACT GT GAGCC AC GGT CAGGAAT AG 
I II III I II II II I I I II II I I I I I I I I I II I I I I II I I I I I I I II I 
1460 CACTAAAGCTTGGAGAATATGGAGAAGTGACAGT GGACTGT GAACCACGGT CAGGGATTG 1519 



601 



660 



ACACT AGC GCTT ACT AC GTTAT GT CAGT GGGT GC GAAGT CCTT CTT GGTT C ACCGAGAAT 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1520 ACACCAATGCTTACTACGTGATGACTGTTGGAACAAAGACGTTTTTGGTCCATCGTGAGT 1579 

661 GGTTTATGGACCTGAACCTTCCATGGAGTAGCGCTGGAAGCACAACGTGGAGGAACCGGG 720 

I I I I I I I I I I I I I I II I II I I I I I II I I I I I I I I II I II I I I I I I I I I 
1580 GGTT CATGGACCTCAACCTCCCTTGGAGCAGTGCTGGAAGT ACT GTGTGGAGGAACAGAG 1639 

721 AAACACT GATGGAGTTTGAAGAACCT CATGCCACCAAACAATCT GT CGTAGCTCTAGGGT 780 

III I I I I I I I I I I I I I I I I I I I I 1 1 I I I I I I I I I I I I I I I I I I I 
1640 AGAC GTTAAT GGAGT T T GAGGAACC ACACGC C ACAAAGCAGT CT GT GAT AGC AT T GGGCT 1699 

781 CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 840 
I II II II II I I I I I I I I I I II I I I I I I I I II I I I I I II II I I I I II I 
1700 CACAAGAGGGAGCTCTGCATCAAGCTTTGGCTGGAGCCATCCCTGTGGAATTTTCAAGCA 1759 



841 



900 



ACACT GT GAAGT T GACAT CAGGACAT CT GAAGT GT AGGGT GAAGAT GGAGAAGT T GCAGC 
I I I I I I I I I I I I I I I II II III I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
1760 ACACT GT CAAGTT GACGTCGGGT CATTT GAAGT GTAGAGT GAAGAT GGAAAAATTGCAGT 1819 



901 



960 



T GAAGGGAACAAC AT AT GGT GT AT GCT CAAAAGC ATTCAAAT T C GCT AGGACTC C C GCT G 
I I I I I I I I I I I I I II II II II I I I I I II I I I I I II I I I I I I I I I II I 
1820 TGAAGGGAACAACCTACGGCGTCTGTTCAAAGGCTTTCAAGTTTCTTGGGACTCCCGCAG 1879 

961 ACACT GGT CATGGAAC GGT GGT GCT GGAACT GC AGT AT ACCGGAAAAGACGGGC CT T GCA 1020 
I I I I I I I I I II II I I I I II I I I I I I I I I I I I II II I II II I I I I I I I 
1880 AC ACAGGT CACGGC ACT GT GGT GTT GGAAT T GC AGT ACACT GGT AC GGAT GGAC CT T GCA 



1939 



1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1940 AAGT T CC C AT CT C GT CAGTGGCTTCAT T GAAC GAC CTAAC AC CAGT GGGC AGAT T GGT C A 1999 

1081 CT GT GAAT C C AT TTGT GT CT GT GGCT AC GGC CAACT CGAAGGT TTT GATT GAACT C GAAC 1140 

I I I I II II I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
2000 CTGTCAACCCTTTTGTTTCAGTGGCCACGGCCAATGCCAAGGTCCTGATTGAACTGGAAC 2059 

1141 CCCC GTT T AGTGACT CT T AC AT C GT GGT GGGGAGAGGAGAACAGCAGATAAACC AC C ACT 1200 

I II III I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I 
2060 CACC CTT T GGAGACT CAT AC AT AGT GGT AGGCAGAGGAGAACAACAGAT CAATC AC CAT T 2119 

1201 GGCACAAATCTGGGAGCAGTATTGGAAAGGCTTTCACCACT ACACT CAGAGGAGCT CAAC 1260 

I I I I II II I I I I I I I I I I I I I II II II II II M I I I I III II II 
2120 GGCATAAGTCTGGAAGCAGCATTGGCAAAGCCTTTACAACCACTCTCAAAGGGGCGCAGA 2179 



Qy 1261 GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 1320 

II I || I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2180 GATT AGC C GCTCT AG GAGAC AC AGCTT G GGACT TT GGAT CAGT T GGAGGGGT GT T C ACCT 2239 



Qy 1321 CGGT AGGGAAAGC CATACAC CAAGTT T T T GGAGGAGCCT TT AGAT C ACT CT T T GGAGGGA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 2240 CAGTAGGGAAGGCTGTCCATCAAGTGTTCGGTGGAGCATTCCGCTCACTGTTCGGAGGTA 2299 

Qy 1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2300 TGTCCTGGATAACGCAGGGATTGCTGGGGGCTCTTCTGTTGTGGATGGGCATCAATGCTC 2359 

Qy 1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I I I I I II I I II III I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I II I 
Db 2360 GTGACAGGTCCATAGCTCTCACGTTTCTCGCAGTTGGAGGAGTCCTGCTCTTCCTCTCTG 2419 

Qy 1501 TCAACGTCCATGCTG 1515 

I I I I I I II I I I I 

Db 2420 TGAACGTGCACGCTG 2434 
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AF404757 11029 bp ss-RNA linear VRL 23-JUL-2002 

West Nile virus isolate WN Italy 1998-equine, complete genome. 
AF404757 

AF404757. 1 GI: 21929240 

West Nile virus 
West Nile virus 

Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 

1 (bases 1 to 11029) 

Lanciotti, R. S. , Ebel,G.D., Deubel,V., Kerst,A.J., Murri,S., 
Meyer, R., Bowen,M., McKinney,N., Morrill, W. E. , Crabtree, M. B . , 
Kramer, L.D. and Roehrig,J.T. 

Complete genome sequences and phylogenetic analysis of West Nile 
virus strains isolated from the United States, Europe, and the 
Middle East 

Virology 298 (1), 96-105 (2002) 
12093177 

2 (bases 1 to 11029) 

Deubel,V., Bowen,M., Meyer, R. , McKinney,N. and Morrill, W. 
Direct Submission 

Submitted ( 02-AUG-2001 ) Division of Vector-Borne Infectious 
Diseases, Centers for Disease Control & Prevention, Rampart Road, 
Fort Collins, CO 80521, USA 

Location/Qualifiers 

1. .11029 

/organism="West Nile virus" 
/mol_type= 11 genomic RNA" 
/isolate="WN Italy 1998-equine" 
/specif ic_host=" equine" 
/db_xref="taxon: 11082" 
/country="Italy" 
97. .10398 



/note="contains capsid, pre-membrane, envelope, NS1, NS2a, 

NS2b, NS3, NS4a, NS4b, and NS5" 

/codon_start=l 

/product="polyprotein precursor" 
/protein_id="AAM81753 . 1" 
/db_xref="GI:21929241 n 

/translation="MSKKPGGPGKSRAWMLKRGMPRVLSLIGLKRAMLSLIDGKGPI 
RFVLALLAFFRFTAIAPTRAVTjDRWRGWKQTAMKHLLSFKKELGTLTSAINRRSSKQ 
KKRGGKTGIAVIVIIGLIASVGAWLSNFQGKvMOTWATDVTDV 

AMDVGYMCDDTITYECPVLSAGNDPEDIDCWCTKSAVYVRYGRCTKTRHSRRSRRSLT 
VQTHGESTLANKKGAWMDSTKATRYLVKTESWILRNPGYALVAAVIGWMLGSNTMQRV 
VFWLLLLVAPAYSFNCLGMSNRDFLEGVSGATWVT)LV^EGDSCVTIMSKDKPTIDVK 
MM^EAANLAEVT^SYCYLATVS 

NGCGLFGKGS I DTCAKFACSTKATGRTI LKENI KYEVAI FVHGPTTVESHGNYSTQIG 
ATQAGRFSITPAAPSYTLKLGEYGEVTVDCEPRSGIDTNAYYVMTVGTKTFLVHREWF 
MDLNLPWSSAGSTVWRNRETLMEFEEPHATKQSVIALGSQEGALHQALAGAIPVEFSS 
NTVT<LTSGHLKCRvl<MEKLQLKGTTYGVCSKAFKFLGTPADTGHGTVVLELQYTGTDG 
P C KVP I S S VAS LN DLT P VGRL VT VN P FVS VAT AN AKVL IELEPPFGDSYI WGRGEQQ 
INHHWHKSGS S I GKAFTTTLKGAQRLAALGDTAWDFGSVGGVFTSVGKAVHQVFGGAF 
RSLFGGMSWITQGLLGALLLWMGINARDRSIALTFI^^ 

ISRQELRCGSGVFIHNDVT^VWMDRYKYYPETPQGLAKIIQKAHKEGVCGLRSVSRLEH 

QlWESVl<DELNTLLKENGVT)LSvWEKQEGMYKSAPKRLTATTEKLEIGWKAWGKSIL 

FAPELANNTFVVDGPETKECPTQNRAVmSLEV^DFGFGLTSTRMFLKVRESNTTECDS 

KIIGTAVT<NNIjAIHSDLSYWIESRLNDTWKLERAvXGEv1<SCTWPETHTLWGDGILES 

DLIIPVTLAGPRSNHNRRPGYKTQNQGPWDEGRVEIDFDYCPGTTVTLSESCGHRGPA 

TRTTTESGKLITDWCCRSCTLPPLRYQTDSGCWYGMEIRPQRHDEKTLVQSQVNAYNA 

DMI DPFQLGLLWFLATQEVLRKRWTAKI SMPAI LI ALLVLVFGGIT YTDVLRYVI LV 

GAAFAESNSGGDVVTiLALMATFKIQPVFMVAS FLKARWTNQENILLMLAAVFFQMAYH 

DARQILLWEIPDVT,NSIAVAWMILRAITFTTTSNVWPLIJVLLTPGLR 

LLMVGIGSLIREKRSAAAKKKGASLLCLALASTGLFNPMILAAGLIACDPNRKRGWPA 

TEVMTAVGLMFAI VGGLAELDI DSMAI PMTI AGLMFAAFVI SGKSTDMWI ERTADI SW 

E S DAE I T G S S ERVDVRL DD DGN FQLMN D P GAPWK I WML RMAC LAI S AYT PWAI L P S VI 

GFWITLQYTKRGGVXWDTPSPKEYKKGDTTTGWRIMTRGLLGSYQAGAGVMVEGVFH 

TLWHTTKGAAmSGEGRLDPYWGSVKEDRLCYGGPWKLQHKWNGQDEVQMIVVEPGKN 

VT<WQTKPGVFKTPEGEIGAVTLDFPTGTSGSPIVDKNGDVIGLYGNGVIMPNGSYIS 

AIVQGERMDEPIPAGFEPEMLRKKQITVLDLHPGAGKTRRILPQIIKEAINRRLRTAV 

LAP T RWAAEMAEAL RGL P I R YQT S AVT REHN GN E I VD VMCHAT LT H RLMS P H RVPN Y 

NLFVMDEAHFTDPASIAARGYISTKVELGEAAAIFMTATPPGTSDPFPESNSPISDLQ 

TEIPDRAWNSGYEWITEYIGKTWFVPSVKMGNEIALCLQRAGKKWQLNRKSYETEY 

PKCKNDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIITEGEGRVILGEPSAVTAA 

SAAQRRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLIAQFYQ 

PEREKVTTMDGEYRLRGEERKNFLELLRTADLPVWLAYKVAAAGVSYHDRRWCFDGPR 

TNTI LEDNNEVEVITKLGERKI LRPRWI DARVYSDHQALKAFKDFASGKRSQI GLI EV 

LGKMPEHFMGKTWEALDTMYVVATAEKGGRAHR 

VFFLLMQRKGIGKIGLGGVVLGVATFFCWMAEVPGTKIAGMLLLSLLLMIVLIPEPEK 

QRSQTDNQLAVFLICVTOTLVSAVAANEMGWLDKTKSDISSLFGQRIEVKENFSMGEFL 

LDLRPATAWSLYAVTTAVXTPLLKHLITSDYINTSLTSI^QASALFTLARGFPFVDV 

GVSALLLAAGCWGQWLTVTWAATLLFCHYAYMVPGWQAEAMRSAQRRTAAGIM 

WDGI VATDVPELERTT P IMQKKVGQIMLI LVS LAAVVVN PS VKTVREAGI LITAAAV 

TLWENGASSVWNATTAIGLCHIMRGGWLSCLSITWTLIKNMDKPGLKRGGAKGRTLGE 

VWKERLNQMTKEEFTRYRKEAI I EVDRSAAKHARKEGNVTGGHPVSRGTAKLRWLVER 

RFLEPVGKVIDLGCGRGGWCYYMATQKRVQEVRGYTKGGPGHEEPQLVQSYGWNIVTM 

KSGVT)VFYRPSECCDTLLCDIGESSSSAEVEEHRTIRVXEMV^DWLHRGPREFCVKVL 

CPYMPKVIEKMELLQRRYGGGLVRNPLSRNSTHEMYWSRASGNWHSVT^ 

RMEKRTWKGPQYEEDVNLGSGTRAVGKPLLNSDTSKIKNRIERLRREYSSTWHHDENH 

PYRTWNYHGSYDVT<PTGSASSLWGWRLLSKPWDTITNVTTMAMTDTTPFGQQRVFK 

EKVTDTKAPEPPEGVTCWLNETTNWLW 



EQNQWRSAREAVEDPKFWEMVDEEREAHLRGECHTCIYNMMGKREKKPGEFGKAKGSR 

AIWFMWLGARFLEFEALGFLNEDHWLGRKNSGGGVEGLGLQKLGYILREVGTRPGGKI 

YADDTAGWDTRITRADLENEAKVLELLDGEHRRLARAIIELTYRHKWI^ 

TVMDVISREDQRGSGQVVTYALNTFTNLAVQLVRMMEGEGVIGPDDVEKLTKGKGPKV 

RTWLFENGEERLSRMAVSGDDCWKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGWY 

DWQQVPFCSNHFTELIMKDGRTLWPCRGQDELVGRARISPGAGWNVRDTACLAKSYA 

QMWLLLYFHRRDLRLMANAICSAVPVNWPTGRTTWSIHAGGEWMTTEDMLEVWNRW 

IEENEWMEDKTPVEKWSDVPYSGKREDIWCGSLIGTRARATWAENIQVAINQVRAIIG 

DEKYVDYMSSLKRYEDTTLVEDTVL" 



ORIGIN 



Query Match 65.8%; Score 999.8; DB 10; Length 11029; 

Best Local Similarity 78.7%; Pred. No. 0; 

Matches 1193; Conservative 0; Mismatches 322; Indels 0; Gaps 0; 

Qy 1 CGGAATT CAGCTT CAACT GT TT AGGAAT GAGCAACAGGGACT T C CT GGAGGGAGT GT CT G 60 

I I I I I I I I I I I II I I I I I I I I I I I I I I II I MINI I I I I I I I I I I I I I I I 
Db 956 CAGCCTACAGCTT CAACT GCCTT GGAAT GAGCAAC AGAGACT T CTT GGAGGGAGT GT CT G 1015 

Qy 61 GAGCT AC AT GGGTT GATCT GGT ACT GGAAGGAGAC AGTT GT GT GAC C ATAAT GT CAAAAG 120 

I I I I I I I I I I I I III I I I I II I I I I I I I I I I II I I I I I II I I I I I II I 
Db 1016 GAGCAACAT GGGT GGATTT GGT T CT C GAAGGCGAC AGCT GC GT GAC TAT CAT GT CCAAGG 1075 

Qy 121 ACAAGCCAACCATTGATGT CAAAATGATGAACAT GGAAGCAGCTAATCTCGCAGAT GTGC ' 180 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II II II II I I I I I II I 
Db 1076 ACAAGC C C ACCATTGAT GT GAAGAT GAT GAAT AT GGAGGCT GC CAACCT GGCAGAGGT C C 1135 

Qy 181 GTAGCTACTGCTACTTAGCTTCGGTCAGTGATCTGTCAACAAAAGCCGCGTGTCCAACCA 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1136 GCAGTTATTGCTATTTGGCTACCGTCAGCGATCTCTCCACCAAAGCTGCATGCCCGACCA 1195 

Qy 241 TGGGT GAAGCT CACAACGAGAAAAGAGC C GAC C CT GCCT TT GT T T GCAAGCAAGGC GT C G 300 

I I I I I I I I I I I I I I I II III I II I I I I I II I I I I I I I I I I I I I I II I 
Db 1196 TGGGAGAAGCT CACAAT GACAAACGT GCT GAC C CAGCTT TT GT GT GC AGACAAGGAGT GG 1255 

Qy 301 TAGACAGAGGAT GGGGGAATGGAT GC GGACT GT T T GGAAAGGGGAGCAT T GACAC AT GT G 360 

I I I I I I II I I I I I II II I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I 
Db 1256 TGGACAGGGGCTGGGGCAACGGCTGCGGACTATTTGGCAAAGGAAGCATTGACACATGCG 1315 

Qy 361 CAAAGT T T GCCT GT ACAACCAAGGCAACT GGT T GGATT AT C C AGAAGGAAAACAT CAAGT 420 

I I I I I I I I I I I I I I I I I I I I I I I I I II III I I I I I I I I I I I I I I 
Db 1316 CCAAAT T T GCCT GCT CCACCAAGGCAAC AGGAAGAACC ATCTTGAAAGAGAATAT CAAGT 1375 

Qy 421 ACGAGGTTGCCATATTTGTGCATGGCCCGACGACTGTCGAATCACATGGCAATTATTCAA 480 

I II II I I I I I I I I I I I I I I I II II I I I I I II II I I I I I II II II I 
Db 1376 ATGAAGTGGCCATCTTTGTCCATGGACCAACCACTGTGGAGTCGCATGGAAACTACTCCA 1435 

Qy 481 CAC AGAT AGGGGCT ACCCAAGCAGGAAGGTT CAGC ATAACT C CAT C GGCACC AT CCT AC A 540 

I I I I I I I I I I I I II II I I I I I II I I I I I I I I I I I I I I I I I II II I I I I 
Db 1436 CACAGAT T GGGGC CACT C AGGCAGGGAGATT CAGC AT C ACT CCT GC GGCGCCTT CAT AC A 1495 

Qy 541 CGCTGAAGTTGGGTGAGTATGGTGAGGTCACAGTTGACTGTGAGCCACGGTCAGGAATAG 600 

I II III I II II I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 1496 CACTAAAGCTTGGAGAAT ATGGAGAAGT GACAGTGGACT GTGAACCACGGT CAGGGATT G 1555 



Qy 



601 ACACT AGCGCTT ACT ACGTT AT GT CAGT GGGT GCGAAGT CCT T CT T GGT T CACC GAGAAT 660 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 1556 ACACTAAT GCCT ACT ACGT GAT GACT GT T GGAACAAAGAC GT T TT T GGTCC AT CGT GAGT 1615 

Qy 661 GGTTTATGGACCTGAACCTTCCATGGAGTAGCGCTGGAAGCACAACGTGGAGGAACCGGG 720 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I > I I I I I I I I I I I I 
Db 1616 GGTT CAT GGACCT CAACCT CCCTTGGAGCAGTGCTGGAAGT ACT GTGT GGAGGAACAGAG 1675 

Qy 721 AAACACT GATGGAGT TT GAAGAACCT C AT GC C AC CAAACAAT CT GT C GT AGCT CT AGGGT 780 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1676 AGAC GTT GAT GGAGT TT GAGGAACC ACAC GC C ACAAAGC AGT CT GT GAT AGCAT T GGGCT 1735 

Qy 781 CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 840 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 1736 CACAAGAGGGAGCT CTGCAT CAAGCTTT GGCTGGAGCCAT CCCT GTGGAATTTTCAAGCA 1795 

Qy 841 ACACTGTGAAGTTGACAT CAGGACATCT GAAGTGT AGGGT GAAGATGGAGAAGTTGCAGC 900 

I I I I I I I I I I I I I I I II II III I I I I II I I I I I I I I I I I I I I I II I I I I I I 
Db 1796 ACACT GTCAAGTTGACGT CGGGCCATTT GAAGT GTAGAGT GAAGATGGAAAAATTGCAGT 1855 

Qy 901 T GAAGGGAACAACAT AT GGT GT AT GCT CAAAAGC AT T CAAATT CGCT AGGACT C C C GCT G 960 

I I I I I I I I I I I I I II II II II I I I I I II I I I I I II I I I I I I I I I I I I 
Db 1856 TGAAGGGAACAACTTACGGCGTCTGTTCAAAGGCTTTCAAGTTTCTTGGGACTCCCGCAG 1915 

Qy 961 ACACT GGT CATGGAACGGTGGTGCTGGAACTGCAGT AT ACCGGAAAAGACGGGCCTTGCA 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1916 ACACAGGTCACGGCACTGTGGTGTTGGAATTGCAGTACACTGGCACGGATGGACCTTGCA 1975 

Qy 1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill. 
Db 1976 AAGTTCCCATCT CGT CAGT GGCTTCATTGAACGACCTAACACCGGTAGGCAGATT GGT CA 2035 

Qy 1081 CT GT GAAT CCAT TT GT GT CT GT GGCT AC GG C CAACT CGAAGGTTT T GATT GAACT CGAAC 1140 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2036 CTGTTAACCCTTTTGTTTCAGTGGCCACGGCCAACGCCAAGGTCCTGATTGAACTGGAAC 2095 

Qy 1141 C CCCGT TT AGT GACT CT TAC AT CGT GGT GGGGAGAG GAGAACAGCAGATAAAC CACCACT 1200 

I II III I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I 
Db 2096 CACCCTTTGGAGACT CATACATAGT GGT AGGCAGAGGAGAACAACAGATCAAT CACCATT 2155 

Qy 1201 GGCACAAAT CT GGGAGC AGT ATT GGAAAGGCTT T CAC CACTAC ACTCAGAGGAGCT CAAC 1260 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2156 GGCATAAGTCTGGAAGCAGCATTGGCAAAGCCTTTACAACCACTCTCAAAGGGGCGCAGA 2215 

Qy 1261 GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 1320 

II I II I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I II I I I I 

Db 2216 GAT T AGCC GCT CTAGGAGACACAGCTT GGGACTT CGGAT C AGTT GGAG GGGTGTTT AC CT 2275 

Qy 1321 C GGT AGGGAAAGCC AT AC ACCAAGTT TTT GGAGGAGC CTT T AGAT CACT CT TT GGAGGGA 1380 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 2276 CAGTAGGGAAGGCTGTCCATCAAGTGTTCGGTGGAGCATTCCGCTCACTGTTCGGAGGTA 2335 

Qy 1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II II I 
Db 2336 TGTCCTGGATAACGCAGGGATTGCTGGGGGCTCTTCTGTTGTGGATGGGCATCAATGCTC 2395 

Qy 1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I I I I I I I I I II III I I I I I I II II I I I I I I I II I I I I I I I I I I I I I II I 
Db 2396 GTGACAGGTCCATAGCTCTCACGTTTCTCGCAGTTGGAGGAGTCCTGCTCTTCCTCTCCG 2455 



Qy 1501 TCAACGTCCATGCTG 1515 

I I II I I I I I I I I 
Db 2456 TGAACGTGCACGCTG 2470 
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LOCUS AY660002 11029 bp RNA linear VRL 19-DEC-2004 

DEFINITION West Nile virus isolate Mex03 from Mexico, complete genome. 
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FEATURES Location/Qualifiers 
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/organism="West Nile virus" 
/mol_type=" genomic RNA" 
/strain="TM171-03" 
/isolate="Mex03" 
/db_xref="taxon: 11082" 
/country="Mexico" 
CDS 97. .10398 

/codon_start=l 

/product="polyprotein precursor" 
/protein_id="AAV68177 .1" 
/db_xref="GI : 55975603" 

/ trans la tion="MSKKPGGPGKSRAVNMLKRGMPRVLSLIGLKRAMLSLIDGKGPI 
RFVLALLAFFRFTAIAPTRAVLDRWRGVNKQTAMKHLLSFKKELGTLTSAINRRSSKQ 
KKRGGKTGIAVMIGLIASVGAWLSNFQGKVMMTVNATDVTDVITIPTAAGKNLCIVR 
AMDVGYMCDDTITYECPVLSAGNDPEDIDCWCTKSAVYVRYGRCTKTRHSRRSRRSLT 
VQT HGE S T LAN KKGAWMD ST KAT RYLVKT ES W I LRN P G YALVAA VTGWML GS NTMQRV 
VFVVLLLLVAPAYSFNCLGMSNRDFLEGVSGATWVDLVXEGDSCVTIMSKDKPTIDVK 
MMNMEIAANLAEVI^SYCYLATVSDLSTKAACPTMGEAHNDKRADPAFVCRQGVVDRGWG 
NGCGLFGKGS I DTCAKFACSTKAI GRTI LKENI KYEVAI FVHGPTTVESHGN YPTQVG 
ATQAGRFS IT PAAP S YTLKLGEYGEVTVDCEPRS GI DTNAYYVMTVGTKT FLVHREWF 
MDLNLPWSSAGSTVWRNRETLMEFEEPHATKQSVIALGSQEGALHQALAGAIPVEFSS 
NTVKLTSGHLKCRVKMEKLQLKGTTYGVCSKAFKFLGTPADTGHGTVVLELQYTGTDG 
PCKVPISSVASLNDLTPVGRLVTVNPFVSVATANAKVLIELEPPFGDSYIWGRGEQQ 
INHHWHKSGSSIGKAFTTTLKGAQRLAALGDTAWDFGSVGGVFTSVGKAVHQVFGGAF 
RSLFGGMSWITQGLLGALLLWMGINARDRSIALTFLAVGGVLLFLSVNVHADTGCAID 
I S RQELRCGSGVFI HNDVEAWMDRYKY YPET PQGLAKI I QKAHKEGVCGLRS VS RLEH 



QMWEAVKDELNTLLKENGVDLSVVVEKQEGMYKSAPKRLTATTEKLEIGWKAWGKSIL 
FAPEl^NTEWDGPETKECPTQNRAWNSLEVEDFGFGLTSTRMFLKVRESNTTECDS 
KIIGTAVKNNLAIHSDLSYWIESRLNDTWKLERAVLGEVKSCTWPETHTLWGDGILES 
DLIIPVTLAGPRSNHNRRPGYKTQNQGPWDEGRVEIDFDYCPGTTVTLSESCGHRGPA 
TRTTTESGKLITDWCCRSCTLPPLRYQTDSGCWYGMEIRPQRHDEKTLVQSQVNAYNA 
DMI DP FQLGLLWFLATQEVLRKRWTAKI SMPAI LI ALLVLVFGGI T YTDVLRYVI LV 
GAAFAE S N S GGDWH LALMAT FKI Q P VFMVAS FLKARWTNQ EN I L LMLAAVFFQMAYH 
DARQI LLWEI PDVLN S LAVAWMI LRAI TFTTT S^^VVVPLLALLT PGLRCLNLDVYRI L 
LLMVGIGSLIREKRSAAAKKKGASLLCLALASTGLFNPMILAAGLIACDPNRKRGWPA 
TEVMTAVGLMFAIVGGLAELDIDSMAIPMTIAGIM 

ESDAEITGSSERVDVRLDDDGNFQLMNDPGAPWKIWMLRMVCLAISAYTPWAILPSW 

GFWITLQYTKRGGVLWDTPSPKEYKKGDTTTGVYRIMTRGLLGSYQAGAGVMVEGVFH 

TLWHTTKGAALMSGEGRLDPYWGSVKEDRLCYGGPWKLQHKWNGQDEVQMIWEPGKN 

VKNVQTKPGVFKTPEGEIGAVTLDFPTGTSGSPIVDKNGDVIGLYGNGVIMPNGSYIS 

AIVQGERMDEPIPAGFEPEMLRKKQITVLDLHPGAGKTRRILPQIIKEAINRRLRTAV 

LAPT RWAAEMAEALRGL P I RYQT S AVP REHNGN EI VDVMCHAT LTH RLMS PH RVPN Y 

NLFVMDEAHFTDPASIAARGYISTKVELGEAAAI FMTATPPGTSDPFPESNSPISDLQ 

TEIPDRAWNSGYEWITEYTGKTVWFVPSVKMGNEIALCLQRAGKKVVQLNRKSYETEY 

PKCKNDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIITEGEGRVILGEPSAVTAA 

SAAQRRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLIAQFYQ 

PEREKVYTMDGEYRLRGEERKNFLELLRTADLPVWLAYKVAAAGVSYHDRRWCFDGPR 

TNTILEDNNEVEVITKLGERKILRPRWIDARVYSDHQALKAFKDFASGKRSQIGLIEV 

LGKMPEHFMGKTWEALDTMYWATAEKGGRAHRMALEELPDALQTIALIALLSV1OT 

VF FLLMQ RKG I GK I GLGGAVLGVAT F FCWMAE VP GT K I AGMLL L S L L LMI VL I P E P E K 

QRSQTDNQLAVFLICVMTLVSAVAANEMGWLDKTKSDISSLFGQRIEVKENFSMGEFL 

LDLRPATAWSLYAVTTAVLTPLLKHLITSDYINTSLTSINVQASALFTLARGFPFVDV 

GVS AL L laagcwgqvt lt vt vtaat ll fch yaymvp gwq aeamr s AQ RRTAAG I MKNA 

WD G I VAT D VP E L E RTT P I MQ KKVGQ I ML I LVS LAAVWN P S VKT VREAG I L I T AAAV 
TLWENGAS S VWNATTAI GLCH IMRGGWLSCLS I TWTLVKNMEKPGLKRGGAKGRTLGE 
VWKERLNQMT KEE FT RYRKEAI I EVDRS AAKHARKEGNVT GGH P VS RGTAKLRWLVER 
RFLEPVGKVIDLGCGRGGWCYYMATQKRVQEVRGYTKGGPGHEEPQLVQSYGWNIVTM 
KSGVDVFYRP S ECCDTLLCDI GES S S SAEVEEHRT I RVLEMVEDWLHRGPREFCVKVL 
CPYMPKVIEKMELLQRRYGGGLVRNPLSRNSTHEMYWV^ 

RMEKRTWKGPQYEEDVNLGSGTRAVGKPLLNSDTSKIKNRIERLRREYSSTWHHDENH 
P YRTWN YHGS YDVKPTGS AS S LVNGWRLLS KPWDT I TNVTTMAMTDTT P FGQQRVFK 
EKVDTKAPEPPEGVKYVLNETTNWLWAFLAREK^^ 

EQNQWRSAREAVEDPKFWEMVDEEREAHLRGECHTCIYNMMGKREKKPGEFGKAKGSR 
AIWFMWLGARFLEFEALGFLNEDHWLGRKNSGGGVEGLGLQKLGYILREVGTRPGGKI 
YADDTAGWDTRITRADLENEAKVLELLDGEHRRLARAI I ELTYRHKWKVMRPAADGR 
TVMDVISREDQRGSGQVWYALNTFTNLAVQLVRMMEGEGVIGPDDVEKLTKGKGPKV 
RTWLFENGEERLSRMAVSGDDCVVKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGWY 
DWQQVPFCSNHFTELIMKDGRTLWPCRGQDELVGRARISPGAGWNVRDTACLAKSYA 
QMWLLLYFHRRDLRLMANAICSAVPVNWVPTGRTTWSIHAGGEWMTTEDMLEVWNRW 
IEENEWMEDKTPVEKWSDVPYSGKREDIWCGSLIGTRARATWAENIQVAINQVRAIIG 
DEKYVD YMS S LKRYEDT I LVEDTVL " 
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/product="nucleocapsid protein" 
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2470. .3525 

/product="non-structural protein 1" 
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/product="non-structural protein 3 M 

/note="NS3" 
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ORIGIN 



Query Match 65.7%; 
Best Local Similarity 78.7%; 
Matches 1192; Conservative 



Score 998.2; 
Pred. No. 0; 
0; Mismatches 



DB 10; Length 11029; 



323; Indels 



0; Gaps 



0; 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1 C GGAAT TC AGCT TCAACTGT TT AGGAAT GAGCAACAGGGACTT C CT GGAGGGAGT GT CT G 60 

I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
956 CAGCTT AC AGCTTCAACTGC CTT GGAAT GAGCAAC AGAGACTT CTT GGAAGGAGT GT CT G 1015 

61 GAGCTACAT GGGTTGATCT GGTACTGGAAGGAGACAGTTGTGTGACCATAAT GTCAAAAG 120 

I I I I I I I I I I I I III I I I I II I I I I I I I I I I II I I I I I II I I I I I II I 
1016 GAGCAACATGGGTGGATTTGGTTCTCGAAGGCGACAGCTGCGTGACTATCATGTCTAAGG 1075 



121 ACAAGCCAACCATT GATGTCAAAAT GAT GAACAT GGAAGCAGCTAAT CTCGCAGATGT GC 
I I I I I I I II I I I I I I I I II I II I I I I I I I I I I II II II II I I I I I II I 
1076 ACAAGCCT ACCAT C GAT GT GAAGAT GAT GAAT AT GGAGGC GGCCAAC CT GGCAGAGGT CC 



181 



1136 



241 



GTAGCTACTGCTACTTAGCTTCGGTCAGTGATCTGTCAACAAAAGCCGCGTGTCCAACCA 

I I I I II II I I I II I II I I I I I I I II I I II II I I I I I I I I I I II I I I I 
GTAGTTATTGCTATTTGGCTACCGTCAGCGATCTCTCCACCAAAGCTGCGTGCCCGACCA 



TGGGTGAAGCTCACAACGAGAAAAGAGCCGACCCTGCCTTTGTTTGCAAGCAAGGCGTCG 

I I I I I I I I I I I I I I I II III I II I I I I I II I I I I I I I I I I I I I I II I 
1196 T GGGAGAAGCTCACAAT GACAAACGTGCT GACCCAGCTTTTGTGT GCAGACAAGGAGTGG 



301 



1256 



361 



T AGACAGAGGAT GGGGGAAT GGATGCGGACT GTTTGGAAAGGGGAGC ATT GAC ACAT GT G 

I I I I I I II I I I I I II II I I I I I I I I I I II I II II I I II I I I I I I I I I I I 
TGGACAGGGGCTGGGGCAACGGCTGCGGACTATTTGGCAAAGGAAGCATTGACACATGCG 



CAAAGTTT GCCT GT ACAAC CAAGGCAACT GGTT GGAT T AT CCAGAAGGAAAACAT CAAGT 
I I I I I I I I I I I I I I I II I I I I I II II III I I I I I I I I I I I I I I 
1316 CCAAATTT GCCT GCT CT AC CAAGGCAAT AGGAAGAAC CAT CTTGAAAGAGAAT AT CAAGT 



180 



1135 



240 



1195 



300 



1255 



360 



1315 



420 



1375 



Qy 



421 



ACGAGGTT GCC ATATTT GT GCAT GGC CC GAC GACT GT CGAAT CAC AT GGCAATT AT TCAA 
I I I I II I I II I I I I II II I I I II II I I I I I II II M M M M I I 



480 



Db 1376 AC GAAGT GGC C ATT T T T GT C CAT GGAC CAACT ACTGT GGAGT CGCAC GGAAACTACC CC A 1435 

Qy 481 CAC AGAT AGG GGCT AC C CAAGC AGGAAGGTTC AGCAT AACT C CAT C GGC AC CAT CCT ACA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1436 CACAGGT T GGAGCC ACT C AGGC AGGGAGAT T C AGCAT C ACT C CT GCGGC GCCT TC AT ACA 1495 

Qy 541 CGCTGAAGTT GGGT GAGTAT GGT GAGGT CACAGTTGACTGT GAGCCACGGTCAGGAAT AG 600 

I II III I II II I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I II II I 
Db 1496 CACTAAAGCT T GGAGAAT ATGGAGAGGT GACAGT GGACT GT GAAC CAC GGT CAGGGATT G 1555 

Qy 601 ACACTAGCGCTTACTACGTTATGTCAGTGGGTGCGAAGTCCTTCTTGGTTCACCGAGAAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1556 AC AC CAAT GC AT ACT AC GT GAT GACT GTT GGAACAAAGAC GTT CT T GGT CC AT CGT GAGT 1615 

Qy 661 GGTTTATGGACCTGAACCTTCCATGGAGTAGCGCTGGAAGCACAACGTGGAGGAACCGGG 720 

I I I I I I I I I II I I I I I II I I I I I II I I I I I I I I II I I I I I I I I I I I I 
Db 1616 GGTTCATGGATCTCAACCTCCCTTGGAGCAGTGCTGGAAGTACTGTGTGGAGGAACAGAG 1675 

Qy 721 AAAC ACT GAT GGAGTTT GAAGAACCT C AT GCC AC CAAACAATCT GT C GT AGCT CT AGGGT 780 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1676 AGAC GTTAAT GGAGT TT GAGGAACCACAC GCC ACGAAGCAGT CT GT GATAGC ATT GGGCT 1735 

Qy 781 CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 840 

I II II II II I I I I I I I I I I I II I I I I I I I I I I I I I I I II II I I I I I I I 
Db 1736 CACAAGAGGGAGCTCTGCATCAAGCTTTGGCTGGAGCCATTCCTGTGGAATTTTCAAGCA 1795 

Qy 841 ACACTGT GAAGTTGACAT CAGGACATCT GAAGT GTAGGGTGAAGATGGAGAAGTTGCAGC 900 

I I I I I II I II I I I I I II II III I I I I I I II I I I I I I I I I I I I I II I I I I I I 
Db 1796 ACACTGT CAAGTTGACGT CGGGTCATTTGAAGT GTAGAGTGAAGATGGAAAAATTGCAGT 1855 

Qy 901 T GAAGGGAACAACAT AT GGTGT AT GCTCAAAAGCATT CAAATT C GCT AGGACT CCC GCT G 960 

I I I I I I I I I I I I I I I I I I II II I I I I I II I I I I I II I I I I I I I I I I I I 
Db 1856 TGAAGGGAACAACCTATGGCGTCTGTTCAAAGGCTTTCAAGTTTCTTGGGACTCCCGCAG 1915 

Qy 961 AC ACTGGT CAT GGAACGGT GGT GCT GGAACTGCAGT AT ACC GGAAAAGACGGGC CT T GC A 1020 

I I I I I I I I I II II I I I I I I I I I I I I I I I I I I II II I II II I I I I I I I 
Db 1916 AC AC AGGT CAC GGCACT GT GGT GTT GGAATTGCAGT AC ACT GGC ACGGAT GGAC CT T GCA 1975 

Qy 1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 1976 AAGTTCCTATCTCGTCAGTGGCTTCATTGAACGACCTAACGCCAGTGGGCAGATTGGTCA 2035 

Qy 1081 CTGTGAATCCATTTGTGTCTGTGGCTACGGCCAACTCGAAGGTTTTGATTGAACTCGAAC 1140 

I I I I II II II I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 2036 CTGTCAACCCTTTTGTTTCAGTGGCCACGGCCAACGCTAAGGTCCTGATTGAATTGGAAC 2095 

Qy 1141 CCCCGTTTAGTGACT CTTACAT CGT GGTGGGGAGAGGAGAACAGCAGATAAACCACCACT 1200 

I II III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I II I 
Db 2096 CAC C CTTT GGAGACT CAT ACAT AGT GGT GGGCAGAGGAGAACAACAGAT CAAT CAC CAT T 2155 

Qy 1201 GGC ACAAAT CT GGGAGC AGT AT T GGAAAGGCTTT C ACC ACT AC ACT CAGAGGAGCT CAAC 1260 

I II I I I I I I I I I I I I I I I I I I I II II II II II II I I I I I I I I I I II 
Db 2156 GGC ACAAGTCT GGAAGC AGCAT TGGCAAAGCCT TTACAACC AC C CTCAAAGGAGC GCAGA 2215 



Qy 

Db 



1261 
2216 



GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 
II I I II I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GACTAGCCGCTCTAGGAGACACAGCTTGGGACTTTGGATCAGTTGGAGGGGTGTTCACCT 



1320 
2275 



Qy 1321 CGGTAGGGAAAGCCATACACCAAGTTTTTGGAGGAGCCTTTAGATCACTCTTTGGAGGGA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2276 CAGTTGGGAAGGCTGTCCATCAAGTGTTCGGAGGAGCATTCCGCTCACTGTTTGGAGGCA 2335 

Qy 1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I II I I I I I I I II II II I I I I I I I I I I I III I I I I I I I I I I I I I I I II r 
Db 2336 TGTCCTGGATAACGCAAGGATTGCTGGGGGCTCTCCTGTTGTGGATGGGCATTAATGCTC 2395 

Qy 1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I I I I I I I I II III I I I I I I II II I I I I I I I I I I I I I I I I I I I I I II I 
Db 2396 GTGATAGGTCCATAGCTCTCACGTTTCTCGCAGTTGGAGGAGTTCTGCTCTTCCTCTCCG 2455 



Qy 1501 TCAACGTCCATGCTG 1515 

I I I I I I I I I II I I 
Db 2456 TGAACGTGCATGCTG 2470 
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CDS 



DQ164190 11029 bp RNA linear VRL 18-NOV-2005 

West Nile virus isolate NY 2003 Suffolk, complete genome. 
DQ164190 

DQ164190.1 GI:76781539 

West Nile virus (WNV) 
West Nile virus 

Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 

1 (bases 1 to 11029) 

Davis, C.T., Ebel, G. D . , Lanciotti, R. S . , Brault,A.C, Guzman, H., 
Siirin,M., Lambert,A., Parsons , R. E . , Beasley, D. W. , Novak,R.J., 
Elizondo-Quiroga, D. , Green, E.N. , Young, D. S . , Stark, L.M. , 
Drebot,M.A. , Artsob,H., Tesh, R. B . , Kramer, L.D. and Barrett, A. D. 
Phylogenetic analysis of North American West Nile virus isolates, 
2001-2004: Evidence for the emergence of a dominant genotype 
Virology 342 (2), 252-265 (2005) 
16137736 

2 (bases 1 to 11029) 

Davis, T.C., Ebel , G . D . , Lanciotti, R. S . and Brault,A.C. 
Direct Submission 

Submitted ( ll-AUG-2005) Pathology, University of Texas Medical 
Branch, 301 University Blvd., Galveston, TX 77550, USA 

Location/Qualifiers 

1. .11029 

/organism="West Nile virus" 
/mol_type= 11 genomic RNA" 
/isolate="NY 2003 Suffolk" 
/specif ic__host= "American crow" 
/db_xref="taxon: 11082" 
/country="USA: NY, Suffolk" 
/collection_date="2003" 
97. .10398 

/note="encodes C, prM, E, NS1, NS2A, NS2B, NS3, NS4A, 

NS4B, NS5" 

/codon_start=l 

/product="polyprotein precursor" 



/protein_id="ABA54579.1" 
/db_xref="GI : 76781540" 

/ translation="MSKKPGGPGKSRAVNMLKRGMPRVLSLIGLKRAMLSLIDGKGPI 

REVLALIAFFRFTAIAPTRAVLNRWRGWKQTAMKHLLSFKKELGTLTSAINRRSSKQ 

KKRGGKTGIAVMIGLIASVGAWLSNFQGKVM^WATDVTDVITIPTAAGKNLCIVR 

AMDVGYMCDDTITYECPVLSAGNDPEDIDCWCTKSAVYVRYGRCTKTRHSRRSRRSLT 

VQT H G E S T LAN KKGAWMD S T KAT R YLVKT E S W I LRN P G YALVAAVI GWMLG S NTMQRV • 

VFWLLL LVAP AY S FNC LGMSN RD FL E GVS GATWVD LVL E G D S CVT I MS KD K PT I DVK 

MMNMEAANLAEWSYCYIATVSDLSTKAACP™^ 

NGCGLFGKGSIDTCAKFACSTKAIGRTILKENIKYEVAIFVHGPTTVESHGNYSTQAG 
ATQAGRFSITPAAPSYTLKLGEYGEVTVDCEPRSGIDTNAYYVMTVGTKTFLVHREWF 
MDLNLPWSSAGSTWRNRETLMEFEEPHATKQSVIALGSQEGALHQALAGAI PVEFSS 
NTVKLT S GHLKCRVKMEKLQLKGTT YGVC S KAFKFLGT PADTGHGT WLELQYTGTDG 
PCKVPISSVASLNDLTPVGRLVTVNPFVSVATANAKVLIELEPPFGDSYIWGRGEQQ 
I NHHWHKS GS S I GKAFTTTLKGAQRLAALGDTAWDFGS VGGVFT S VGKAVHQVFGGAF 
RS LFGGMSWI TQGLLGALLLWMGINARDRS I ALT FLAVGGVLLFLS VNVHADTGCAI D 
ISRQELRCGSGVFIHNDVEAWMDRYKYYPETPQGLAKIIQKAHKEGVCGLRSVSRLEH 
QMWEAVKDELNTLLKENGVDLSVVVEKQEGMYKSAPKRLTATTEKLEIGWKAWGKSIL 
FAPELANNTFVVDGPETKECPTQNRAWNSLEVEDFGFGLTSTRMFLKVRESNTTECDS 
KIIGTAVXNNLAIHSDLSYWIESRLNDTWKLERAVLGEVKSCTWPETHTLWGDGILES 
DLIIPVTLAGPRSNHNRRPGYKTQNQGPWDEGRVEIDFDYCPGTTVTLSESCGHRGPA 
TRTTTESGKLITDWCCRSCTLPPLRYQTDSGCWYGMEIRPQRHDEKTLVQSQVNAYNA 
DMIDPFQLGLLWFLATQEVLRKRWTAKISMPAILIALLVLVFGGITYTDVLRYVILV 
GAAFAESNSGGDVVHLALMATFKIQPVFMVASFLK 

DARQILLWEIPDVLNSIAVAWMILRAITFTTTSNVWPLLALLTPGLRCLNLDVYRIL 

LLMVGIGSLIREKRSAAAXKKGASLLCLAIASTGLFNPMILAAGLIACDPNRKRGWPA 

TEVMTAVGLMFAIVGGLAELDIDSMAI PMTIAGLMFAAFVISGKSTDMWIERTADISW 

ESDAEITGSSERVDVRLDDDGNFQLMNDPGAPWKIWMLRMVCLAI SAYTPWAILPSW 

GFWITLQYTKRGGVLWDTPSPKEYKKGDTTTGVYRIMTRGLLGSYQAGAGVMVEGVFH 

TLWHTTKGAALMSGEGRLDPYWGSVKEDRLCYGGPWKLQHKWNGQDEVQMIVVEPGKN 

VKNVQTKPGVFKTPEGEIGAVTLDFPTGTSGSPIVDKNGDVIGLYGNGVIMPNGSYIS 

AIVQGERMDEPIPAGFEPEMLRKKQITVLDLHPGAGKTRRILPQIIKEAINRRLRTAV 

LAPTRWAAEMAEALRGLPIRYQTSAVPREHNGNEIVDVMCHATLTHRLMSPHRVPNY 

NLFVMDEAHFTDPASIAARGYI STKVELGEAAAI FMTATPPGTSDPFPESNSPI SDLQ 

TEIPDRAWNSGYEWITEYTGKTVWEVPSVKMGNEIALCLQRAGKKWQLNRKSYETEY 

PKCKNDDWDFVITTDISEMGANFKASRVIDSRKSVKPTIITEGEGRVILGEPSAVTAA 

SAAQRRGRIGRNPSQVGDEYCYGGHTNEDDSNFAHWTEARIMLDNINMPNGLIAQFYQ 

PEREKVYTMDGEYRLRGEERKNFLELLRTADLPVWLAYKVAAAGVSYHDRRWCFDGPR 

TNTILEDNNEVEVITKLGERKILRPRWIDARVYSDHQALKAFKDFASGKRSQIGLIEV 

LGKMPEHFMGKTWE1ALDTMYVVATAEKGGRAHRMALEELPDALQTIALIAL 

VFFLLMQRKGIGKIGLGGAVLGVATFFCWMAEVPGTKIAGMLLLSLLLMIVLIPEPEK 

QRSQTDNQIAVFLICVMTLVSAVAANEMGWLDKTKSDISSLFGQRIEVKENFSMGEFL 

L DL RP AT AWS L YAVTT AVLT PLLKHLITSDYINTSLTSI NVQAS AL FT LARG FP FVDV 

GVS AL LLAAGCWGQVT LT VT VT AAT L L FC H YAYMVP GWQAEAMRS AQ RRTAAGI MKN A 

VVDGIVATDVPELERTTPIMQKKVGQIMLILVSIAAVVVNPSVKTVREAGILITAAAV 

TLWENGAS S VWNATTAI GLCH I MRGGWL S CL S I TWT LI KNMEKP GLKRGGAKGRT LGE 

VWKERLNQMTKEEFTRYRKEAIIEVDRSAAKHARKEGNVTGGHPVSRGTAKLRWLVER 

RFLEPVGKVIDLGCGRGGWCYYMATQKRVQEVRGYTKGGPGHEEPQLVQSYGWNIVTM 

KSGVDVFYRPSECCDTLLCDIGES SS SAEVEEHRTI RVLEMVEDWLHRGPREFCVKVL 

CPYMPKVIEKMELLQRRYGGGLVRNPLSRNSTHENTYW 

RMEKRTWKGPQYEEDVNLGSGTRAVGKPLLNSDTRKI KNRI ERLRREYS STWHHDENH 
P YRTWN YHGS YDVKPTGSAS S LVNGWRLLS KPWDT I TNVTTMAMTDTT P FGQQRVFK 
EKVDTKAPEPPEGVKYVLNETTNWLWAFLAREKRPRMC^ 

EQNQWRSARE1AVEDPKFWE1MVDEEREAHLRGECHTCIY1JMMGKREKKPGEFGKAKGSR 
AIWFMWLGARFLEFEALGFLNEDHWLGRKNSGGGVEGLGLQKLGYILREVGTRPGGKI 
YADDTAGWDTRITRADLENEAKVLELLDGEHRRLARAI I ELTYRHKWKVMRPAADGR 
TVMDVISREDQRGSGQVVTYALNTFTNLAVQLVRMMEGEGVIGPDDVEKLTKGKGPKV 



RTWLFENGEERLSRMAVSGDDCVVKPLDDRFATSLHFLNAMSKVRKDIQEWKPSTGWY 
DWQQVPFCSNHFTELIMKDGRTLWPCRGQDELVGRT^ISPGAGWNVRDTACLAKSYA 
QMWLLLYFHRRDLRLMANAICSAVPVNWVPTGRTTWSIHAGGEWMTTEDMLEVWNRVW 
I EENEWMEDKT PVEKWS DVP YS GKREDI W.CGS LI GTRARATWAENI QVAINQVRAI I G 
DEKYVD YMS S LKRYEDTT LVEDTVL " 

ORIGIN 

Query Match 65.7%; Score 998.2; DB 10; Length 11029; 

Best Local Similarity 78.7%; Pred. No. 0; 

Matches 1192; Conservative 0; Mismatches 323; Indels 0; Gaps 0; 

Qy 1 C GGAAT T CAGCT T CAACT GT T T AGGAAT GAGCAACAGGGACTT C CTGGAGGGAGT GT CT G 60 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 956 CAGCTT ACAGCT T CAACT GCCT T GGAAT GAGCAACAGAGACTT CTTGGAAGGAGT GT CT G 1015 

Qy 61 GAGCT ACATGGGTTGATCTGGTACTGGAAGGAGACAGTTGTGT GACCATAATGT CAAAAG 120 

I I I I MINIM III I I I I II II I II I I I I I II Mill II I I I I I II I 
Db 1016 GAGCAACATGGGTGGATTTGGTTCTCGAAGGCGACAGCTGCGTGACTATCATGTCTAAGG 1075 

Qy 121 ACAAGCCAAC CAT T GAT GT CAAAAT GAT GAACAT GGAAGCAGCT AAT CTC GC AGAT GT GC 180 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II II II II I I I I I II I 
Db 1076 ACAAGCCT AC CATC GAT GTGAAGAT GAT GAAT AT GGAGGCGGC CAAC CTGGC AGAGGT C C 1135 

Qy 181 GTAGCTACTGCTACTTAGCTTCGGTCAGTGATCTGTCAACAAAAGCCGCGTGTCCAACCA 240 

I I I I I I I M I I I I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I 

Db 1136 GCAGTT ATT GCT AT T TGGCT AC CGT CAGCGAT CT CT C CAC CAAAGCT GCGT GC C CGAC CA 1195 

Qy 241 TGGGTGAAGCTCACAACGAGAAAAGAGCCGACCCTGCCTTTGTTTGCAAGCAAGGCGTCG 300 

I I I I I I I I I I I I I I I II III I II I I I I I II I I I I I I I I I I I I I I II I 
Db 1196 T GGGAGAAGCTCACAATGACAAACGTGCT GACCCAGCTTTT GTGT GCAGACAAGGAGT GG 1255 

Qy 301 T AGACAGAGGAT GGGGGAAT GGATGC GGACT GT T T G GAAAGGGGAGCATT GAC ACATGT G 360 

I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1256 TGGACAGGGGCTGGGGCAACGGCTGCGGACTATTTGGCAAGGGAAGCATTGACACATGCG 1315 

Qy 361 CAAAGT TT GC CT GT ACAACCAAGGCAACT GGTT GGAT TAT CCAGAAGGAAAAC AT CAAGT 420 

I I I I I I I I I I I I I I I I I I I I I I II II III I I I I I I I I I I I I I I 
Db 1316 CCAAAT TTGC CT GCT CT ACCAAGGCAAT AGGAAGAAC CAT CT TGAAAGAGAAT AT CAAGT 1375 

Qy 421 ACGAGGTTGCCATATTTGTGCATGGCCCGACGACTGTCGAATCACATGGCAATTATTCAA 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1376 ACGAAGTGGCCATTTTTGTCCAT GGACCAACTACTGT GGAGT CGCACGGAAACTACTCCA 1435 

Qy 481 CACAGATAGGGGCTACCCAAGCAGGAAGGTT CAGCATAACTCCATCGGCACCAT CCTACA 540 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 1436 C ACAGGCT GGAGCC ACT CAGGC AGGGAGAT T CAGCAT C ACT C CT GCGGCGC CT T CAT ACA 1495 

Qy 541 C GCT GAAGTT GGGT GAGT AT GGT GAGGT CACAGTT GACT GT GAGC CAC GGT CAGGAAT AG 600 

I II I II I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 1496 C ACT AAAGCTT GGAGAAT AT GGAGAGGT GAC AGT GGACT GTGAAC CAC GGT CAGGGATT G 1555 

Qy 601 ACACTAGCGCTTACTACGTTATGTCAGTGGGTGCGAAGTCCTTCTTGGTTCACCGAGAAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 1556 ACACCAATGCATACTACGTGATGACT GTTGGAACAAAGACGTT CTTGGTCCATCGT GAGT 1615 



Qy 



661 



GGTTTATGGACCTGAACCTTCCATGGAGTAGCGCTGGAAGCACAACGTGGAGGAACCGGG 720 
I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I II I I I I I I I I I I I I 



Db 1616 GGTTCATGGACCTCAACCTCCCTTGGAGCAGTGCTGGAAGTACTGTGTGGAGGAACAGAG 1675 

Qy 721 AAACACT GAT GGAGT TT GAAGAACCT CAT GCCAC CAAACAAT CT GT C GT AGCT CT AGGGT 780 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1676 AGACGT TAAT GGAGT T T GAGGAACC ACACGCCAC GAAGCAGT CT GT GAT AGC AT T G G GCT 1735 

Qy 781 CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 840 

I II II II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 1736 CACAAGAGGGAGCT CTGCAT CAAGCTTTGGCTGGAGCCATT CCTGTGGAATTCT CAAGCA 1795 

Qy 841 AC ACT GT GAAGTT GACAT C AGGACAT CT GAAGTGT AGGGT GAAGAT GGAGAAGT T GC AGC 900 

I I I I I I I I I I I I I I I II II III I I I I I I I I I I I I I I I I II I I I II I I I I I I 

Db 1796 AC ACT GT CAAGTT GACGT C GGGT CAT TT GAAGT GT AGAGT GAAGAT GGAAAAATT GCAGT 1855 

Qy 901 TGAAGGGAACAACATATGGTGTATGCTCAAAAGCATTCAAATTCGCTAGGACTCCCGCTG 960 

II I I II I I I II I I I I I II II II Mill II I I I I I II I I I I I I I I I I I I 

Db 1856 TGAAGGGAACAACCTATGGCGTCTGTTCAAAGGCTTTCAAGTTTCTTGGGACTCCCGCAG 1915 

Qy 961 AC ACT GGT CAT GGAAC GGT GGT GCT GGAACT GCAGT AT AC C GGAAAAGAC G GGC CTT GCA 1020 

I I I I I I I I I II II MINI Mill II I I I I I II II I M M II I I II I 
Db 1916 ACACAGGTCACGGCACTGTGGTGTTGG7VATTGCAGTACACTGGCACGGATGGACCTTGCA 1975 

Qy 1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I' M I I I I I I I I I I I 1 M I M I I I I I I I I I I I M I I I I I I II I 
Db 1976 AAGTTCCTATCTCGTCAGTGGCTTCATTGAACGACCTAACGCCAGTGGGCAGATTGGTCA 2035 

Qy 1081 CT GT GAAT C CAT TT GT GT CT GT GGCT AC GGCCAACTC GAAGGT T T T GATT GAACT CGAAC 1140 

I I I I II II I I I I I II Mill I I I I I I I I I I I I I I I I II I I I II I I II I 

Db 2036 CT GT CAAC C CT TT T GT T T CAGT GGC C ACGGCCAAC GCT AAGGT C CT GATTGAATT GGAAC 2095 

Qy 1141 CCCCGTTTAGT GACTCTT ACAT CGT GGTGGGGAGAGGAGAACAGCAGATAAACCACCACT 1200 

I II III I II I I I I I I II I I I I I I I I II II I II I I I I II II I II II I I I I 
Db 2096 CACCCTTT GGAGACTCAT ACATAGT GGTGGGCAGAGGAGAACAACAGATCAATCACCATT 2155 

Qy 1201 GGCACAAAT CT GGGAGC AGT AT T GGAAAGGCT TT CAC CACT ACACT CAGAGGAGCT CAAC 1260 

I I II I I I I I I II I II I I II I II II II II II II II I II I I I II II II 
Db 2156 GGCACAAGT CT GGAAGC AGCAT T GGCAAAGC CTT T ACAAC CACC CT CAAAGGAGC GC AGA 2215 

Qy 1261 GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 1320 

I I I I II I I I I I I I I I II I I II I I II I I I I I I II I I I I II II I II I I I I I I I I 
Db 2216 GACT AGC C GCT CT AGGAGACACAGCT T GGGACTT T GGAT CAGT T GGAGGGGT GT T CAC CT 2275 

Qy 1321 CGGT AGGGAAAGC CAT AC ACCAAGT T TTT GGAGGAGC CT T T AGAT CACT CTT T GGAGGGA 1380 

I II I II I I I I I I I I II I I II M I II I I I I I I I I II I I I II II I I 
Db 2276 CAGTTGGGAAGGCTGTCCATCAAGTGTTCGGAGGAGCATTCCGCTCACTGTTCGGAGGCA 2335 

Qy 1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I II I I I II I I II II II I Mill II II I III II I I II I I I I II M II I 
Db 2336 TGTCCTGGATAACGCAAGGATTGCTGGGGGCTCTCCTGTTGTGGATGGGCATCAATGCTC 2395 

Qy 1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I II I I II I I II III I II I I I II II II II I II II I I I I I II I II I I II I 
Db 2396 GTGATAGGTCCATAGCTCTCACGTTTCTCGCAGTTGGAGGAGTTCTGCTCTTCCTCTCCG 2455 

Qy 1501 TCAACGTCCATGCTG 1515 

I II I I I I II I I I I 
Db 2456 TGAACGTGCATGCTG 2470 
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AY371271 2004 bp RNA linear VRL 25-NOV-2003 

West Nile virus strain TM171-03 polyprotein gene, partial cds . 
AY371271 

AY371271.1 GI: 38224786 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
REFERENCE 
AUTHORS 

TITLE 
JOURNAL 

FEATURES 

source 



West Nile virus (WNV) 
West Nile virus 

Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 

1 (bases 1 to 2004) 

Estrada-Franco, J. G. , Navarro-Lopez, R. , Beasley, D.W. C . , Coffey, L. , 
Carrara, A. -S . , Travassos da Rosa, A., Clements, T., Wang,E., 
Ludwig,G.V. , Campomanes Cortes, A., Paz Ramirez, P., Tesh,R.B., 
Barrett,A.D.T. and Weaver, S.C. 

West Nile virus in Mexico: evidence of widespread circulation since 
July, 2002 

Emerging Infect. Dis. 9 (12), 1604-1607 (2003) 

2 (bases 1 to 2004) 

Beasley, D.W. C. , Estrada-Franco, J. G. , Tesh, R. B. , Weaver, S.C. and 
Barrett,A.D.T. 
Direct Submission 

Submitted (20-AUG-2003) Pathology, University of Texas Medical 
Branch, 301 University Blvd., Galveston, TX 77555-0609, USA 
Location/Qualifiers 
1. .2004 

/organism="West Nile virus" 
/mol_type=" genomic RNA" 
/strain="TM171-03" 
/db_xref="taxon: 11082" 
/ country="Mexico" 
<1. ,>2004 
/codon_start=l 
/product="polyprotein" 
/protein_id="AAR14153.1" 
/db_xref ="GI : 38224787" 

/translation="WLSNFQGKV>IMTWATDWDVITIPTAAGKNLCIvI^AMDVG™ 
CDDTITYECPVLSAGNDPEDIDCWCTKSAVYVRYGRCTKTRHSRRSRRSLTVQTHGES 
TLANKKGAWMDSTKATRYLVl^TESWILRNPGYALVAAWGWMLGSNTMQRWEVVLLL 
LVAPAYSFNCLGMSNRDFLEGVSGATWVT)LVXEGDSCWIMSKDKPTIDV1<MMNME1AA 
NLAEVKSYCYLATVSDLSTKAACPTMGEAHNDKRADPAFVCRQGVVDRGWGNGCGLFG 
KGSIDTCAKFACSTKAIGRTILKENIKYEVAIFVHGPTTVESHGNYPTQVGATQAGRF 
SITPAAPSYTLKLGEYGEVTVDCEPRSGIDTNAYYVMTVGTKTFLVHREWFMDLNLPW 
SSAGSTWRNRETLMEFEEPHATKQSVIALGSQEGALHQALAGAIPVEFSSNTVKLTS 
GHLKCRVKMEKLQLKGTT YGVCS KAFKFLGT PADTGHGTWLELQ YTGTDGPCKVP I S 
SVASLNDLTPVGRLWVNPFVSVATANAKVLIELEPPFGDSYIWGRGEQQINHHWHK 
SGS S I GKAFTTTLKGAQRLAALGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRSLFGGM 
SWITQGLLGALLLWMGINARDRSIALTFLAVGGVTjLFLSWVHA" 

mat_peptide 1. .276 

/product="pre-membrane protein prM" 

mat_peptide 277. .501 

/product="membrane protein M" 

mat_peptide 502. .2004 

/product="envelope protein E" 



CDS 



ORIGIN 



Query Match 65.6%; Score 997.2; DB 10; Length 2004; 

Best Local Similarity 78.7%; Pred. No. 0; 

Matches 1191; Conservative 0; Mismatches 323; Indels 0; Gaps 0; 



Qy 1 CGGAATTCAGCTTCAACTGTTTAGGAATGAGCAACAGGGACTTCCTGGAGGGAGTGTCTG 60 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 491 C AGCT T ACAGCTT CAACT GC CTT GGAAT GAGCAAC AGAGACT T CTT GGAAGGAGT GT CTG 550 

Qy 61 GAGCT ACAT GGGT T GAT CT GGT ACT G GAAGGAGAC AGTT GT GT GAC CATAAT GT CAAAAG 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 551 GAGCAACAT GGGT GGAT TT GGT T CT CGAAGGC GAC AGCT GC GT GAC TAT CAT GT CTAAGG 610 

Qy 121 ACAAGCCAACCATTGATGT CAAAAT GATGAACATGGAAGCAGCTAATCT CGCAGATGTGC 180 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II II II II I I I I I II I 
Db 611 ACAAG C CT AC CAT CGAT GT GAAGAT GAT GAAT AT GGAGGC GGC CAACCT GGC AGAGGT CC 670 

Qy 181 GTAGCTACTGCTACTTAGCTTCGGTCAGTGATCTGTCAACAAAAGCCGCGTGTCCAACCA 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 671 GTAGTTATTGCTATTTGGCTACCGTCAGCGATCTCTCCACCAAAGCTGCGTGCCCGACCA 730 

Qy 241 TGGGTGAAGCTCACAACGAGAAAAGAGCCGACCCTGCCTTTGTTTGCAAGCAAGGCGTCG 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 731 TGGGAGAAGCT CACAAT GACAAACGT GCTGACCCAGCTTTTGT GTGCAGACAAGGAGT GG 790 

Qy 301 T AGACAGAGGATGGGGGAAT GGAT GC GGACT GTT T GGAAAGGGGAGCAT T GACACAT GTG 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 7 91 TGGACAGGGGCTGGGGCAACGGCTGCGGACTATTTGGCAAAGGAAGCATTGACACATGCG 850 

Qy 361 CAAAGTTT GCCTGTACAACCAAGGCAACTGGTT GGATTAT CCAGAAGGAAAACAT CAAGT 420 

I I I I I I I I I I I I I I I I I I I I I I II I I III I I I I I I I I I I I I I I 
Db 851 CCAAATTT GCCTGCTCTACCAAGGCAATAGGAAGAACCAT CTT GAAAGAGAAT AT CAAGT 910 

Qy 421 AC GAGGT T GCC ATATT T GT GCAT GGC C C GAC GACT GT CGAAT C ACAT GG CAATT ATT CAA 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 911 AC GAAGT GGC CAT T TT T GT CCAT GGAC CAACT ACT GT GGAGT C GCAC GGAAACTACCC C A 970 

Qy 481 C ACAGAT AGGGGCT AC C CAAGC AGGAAGGT T C AGC ATAACTC C AT CGGCACC AT CCT ACA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 971 C ACAGGT T GGAGC CACT CAGGC AGGGAGATT CAGCAT CACT CCTGC GGC GC CTT CAT ACA 1030 

Qy 541 CGCT GAAGTT GGGT GAGTAT GGT GAGGT C AC AGTTGACT GT GAGC CAC GGT C AGGAAT AG 600 

I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 1031 CACTAAAGCT T GGAGAAT AT GGAGAGGT GAC AGT GGACT GT GAAC CAC GGT CAGGGAT T G 1090 

Qy 601 ACACTAGCGCTTACTACGTTATGTCAGTGGGTGCGAAGTCCTTCTTGGTTCACCGAGAAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1091 ACACCAATGCATACTACGT GAT GACT GTTGGAACAAAGACGTT CTT GGTCCATCGTGAGT 1150 

Qy 661 GGTTT AT GGAC CT GAACCTT CCAT GGAGT AGC GCT GGAAGCACAACGT GGAGGAACC GGG 720 

I I I I I I I I I II I I I I I II I I I I I II I I I I I I I I II I I I I I I I I I I I I 
Db 1151 GGTT C ATGGAT CT CAACCT CCCT T GGAGCAGT GCT GGAAGTACT GT GT GGAGGAACAGAG 1210 

Qy 721 AAAC ACT GATGGAGTT T GAAGAACCT C AT GC C ACCAAACAAT CT GT CGT AGCT CT AGGGT 780 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 1211 AGACGTTAATGGAGTTTGAGGAACCACACGCCACGAAGCAGTCTGTGATAGCATTGGGCT 1270 
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781 



1271 



841 



1331 



901 



CGCAGGAAGGTGCCTTGCACCAAGCTCTGGCTGGAGCAATTCCTGTTGAGTTCTCAAGCA 
I II II II II I I I I I I I II I I I I I I I I I I I I I I I I I I I II II I I I I II I 
CACAAGAGGGAGCTCTGCATCAAGCTTTGGCTGGAGCCATTCCTGTGGAATTTTCAAGCA 



840 



1330 



900 



AC ACTGT GAAGTT GACAT C AGGACAT CT GAAGT GTAGGGT GAAGAT GGAGAAGT T GCAGC 
I I I I I I I I I I I I I I I II II III I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
ACACT GT CAAGTT GACGT C GGGT CAT TT GAAGT GTAGAGT GAAGAT GGAAAAAT T G CAGT 1390 



960 



T GAAGGGAACAACAT AT GGT GT AT GCT CAAAAGC AT T CAAAT T C GCT AGGACT C C C GCT G 
I I I I I I I I II I I I I I I I I II II I I I I I II I I I I I II I I I I I I I I I I I I 
1391 TGAAGGGAACAACCTATGGCGTCTGTTCAAAGGCTTTCAAGTTTCTTGGGACTCCCGCAG 1450 

961 ACACTGGTCATGGAACGGTGGTGCTGGAACTGCAGTATACCGGAAAAGACGGGCCTTGCA 1020 

I I I I I I I I I II II I I I I I I I I I I I I I I I I I I II II I II II I I I I I I I 
1451 AC ACAGGT C AC GGCACT GT GGT GTT GGAATT GCAGT AC ACT GGCACGGAT GGAC CT T GCA 1510 

1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1511 AAGTTCCTATCTCGTCAGTGGCTTCATTGAACGACCTAACGCCAGTGGGCAGATTGGTCA 1570 

1081 CTGTGAATCCATTTGTGTCTGTGGCTACGGCCAACTCGAAGGTTTTGATTGAACTCGAAC 1140 

I I I I II II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1571- CTGTCAACCCTTTTGTTTCAGTGGCCACGGCCAACGCTAAGGTCCTGATTGAATTGGAAC 1630 

1141 CCCCGTTTAGT GACT CTT ACAT CGTGGT GGGGAGAGGAGAACAGCAGATAAACCACCACT 1200 

I II III I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I II I I II I I 
1631 CACC CT T T GGAGACT CAT ACAT AGT GGT GGGCAGAGGAGAACAACAGAT CAAT C AC CAT T 1690 

1201 GGCACAAATCT GGGAGCAGTATT GGAAAGGCTTT CACCACTACACTCAGAGGAGCT CAAC 1260 

I I I I I I I I I I I I I I I I I I I I I I II II II II II II I I I I I I I I I I II 
1691 GGC ACAAGT CT GGAAGCAGCAT T GGCAAAGC CTT T ACAAC CACC CTCAAAGGAGC GCAGA 1750 



1261 



1320 



GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 

I I I I II I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
1751 GACTAGCCGCTCTAGGAGACACAGCTTGGGACTTTGGATCAGTTGGAGGGGTGTTCACCT 1810 

1321 CGGTAGGGAAAGCCATACACCAAGTTTTTGGAGGAGCCTTTAGATCACTCTTTGGAGGGA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
1811 CAGTTGGGAAGGCTGTCCATCAAGTGTTCGGAGGAGCATTCCGCTCACTGTTTGGAGGCA 1870 

1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I II I I I I I I I II II II I I I I I I I I I I I III I I I I II I I I I I I I I I II I 
1871 TGTCCTGGATAACGCAAGGATTGCTGGGGGCTCTCCTGTTGTGGATGGGCATTAATGCTC 1930 

1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I I I I I I I I II I II I I I I I I II II I I I I I II I I I I I I I I I I I I I I II I 
1931 GTGATAGGTCCATAGCTCTCACGTTTCTCGCAGTTGGAGGAGTTCTGCTCTTCCTCTCCG 1990 

1501 T CAACGT C CAT GCT 1514 

I I I I II I I I I I I 
1991 TGAACGTGCATGCT 2004 
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West Nile virus 
West Nile virus 

Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 
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Location/Qualifiers 

1. .2004 

/organism="West Nile virus" 
/mo l_type=" genomic RNA" 
/serotype="lineage I" 
/ isolation_source="serum" 
/specif ic_host="Homo sapiens" 
/db_xref="taxon: 11082" 
/country="Mexico : Sonora" 
<1. .>2004 
/codon_start=l 
/product= n polyprotein" 
/protein_id="AAY32589 . 1" 
/db_xref="GI: 63098702" 

/trans la tion="VTLSNFQGKVMMTWATDWDVITIPTAAGKNLCIVRAMDVGYM 
CDDTITYECPVLSAGNDPEDIDCWCTKSAVYVRYGRCTKTRHSRRSRRSLTVQTHGES 
T LANKKGAWMD S T KAT RYL VKT E S WI LRN P G YALVAAVI GWMLG S NTMQ RWFWL L L 
LVAPAYS FNCLGMSNRDFLEGVSGATWVDLVLEGDSCVT IMS KDKPT I DVKMMNMEAA 
NLAEVKSYCYLATVSDLSTKAACPTMGE^NDKRADPAFVCRQGVVDRGWGNGCGLFG 
KGSIDTCAKFACSTKAIGRTILKENIKYEVAIFVHGPTTVESHGNYSTQAGATQAGRF 
SITPAAPSYTLKLGEYGEVTVDCEPRSGIDTNAYYVMTVGTKTFLVHREWFMDLNLPW 
SSAGSTWRNRETMEFEEPHATKQSVIALGSQEGALHQALAGAIPVEFSSNTVKLTS 
GHLKCRVKMEKLQLKGTTYGVCSKAFKFLGTPADTGHGTVVLELQYTGTDGPCKVPIS 
SVASLNDLTPVGRLVTWPFVSVATANAKVLIELEPPFGDSYIWGRGEQQINHHWHK 
SGSSIGKAFTTTLKGAQRLAALGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRSLFGGM 
SWI TQGLLGALLLWMGINARDRS I ALT FIAVGGVLLFLS VNVHA" 
466. .966 

/product="pre-membrane protein; prM/M" 
967. .>2004 

/product="envelope protein" 



ORIGIN 



Query Match 65.6%; Score 997.2; DB 10; Length 2004; 

Best Local Similarity 78.7%; Pred. No. 0; 

Matches 1191; Conservative 0; Mismatches 323; Indels 0; Gaps 0; 

Qy 1 CGGAATTCAGCTT CAACT GTTTAGGAAT GAGCAACAGGGACTTC CT GGAGGGAGT GTCTG 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 

Db 491 CAGCT T ACAGCT TCAACT GC CT T GGAAT GAGCAAC AGAGACTT CTT GGAAGGAGT GT CT G 550 

Qy 61 GAGCTACATGGGTT GAT CTGGTACT GGAAGGAGAGAGTTGT GTGACCATAATGTCAAAAG 120 

II I I I I I I I I I I III I I I I II I I I I I I I I I I II I I I I I II I I I I I I I I I 

Db 551 GAGCAACAT GGGTGGATT T GGT T CT C GAAGGC GACAGCTGC GT GACT AT CAT GT CTAAAG 610 

Qy 121 ACAAGC CAACC ATT GAT GTCAAAAT GAT GAAC AT GGAAGC AGCT AAT CT C GCAGAT GT GC 180 

I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I II II II II I I I I I II I 
Db 611 ACAAGC CTAC CATC GAT GTGAAGATGAT GAATAT GGAGGCGGCCAACCT GGCAGAGGTCC 670 

Qy 181 GTAGCTACTGCTACTTAGCTTCGGTCAGTGATCTGTCAACAAAAGCCGCGTGTCCAACCA 240 

I II II M I I I II III I I I I I I I I I I I II II I II II I I I I I II I I I I 
Db 671 GCAGTTATTGCTATTTGGCTACCGTCAGCGATCTCTCCACCAAAGCTGCGTGCCCGACCA 730 

Qy 241 TGGGT GAAGCT CACAAC GAGAAAAGAGC C GACCCT GC CTT T GT T T GCAAGCAAGGC GT CG 300 

I I I I I I I I I I I I I I I II III I II I I I I I II I I I I I I I I I I I I I I II I 
Db 731 T GGGAGAAGCT C ACAAT GACAAAC GT GCT GAC C CAGCTTT T GT GT GC AGACAAGGAGT GG 790 

Qy 301 TAGACAGAGGAT GGGGGAATGGATGCGGACT GTTT GGAAAGGGGAGCATT GACACATGTG 360 

I I I I I I II I I I I I II II I I I I I I I I I I I I I II II I I I I I I I I I II I I I I 
Db 791 TGGACAGGGGCT GGGGCAAC GGCTGC GGACT ATTT GGCAAAGGAAGCAT T GAC ACAT GCG 850 

Qy 361 CAAAGT TT GCCT GT ACAAC CAAGGCAACT GGTT GGATT AT CC AGAAGGAAAACAT CAAGT 420 

I I I I I I I I I I I I I I I I I I I I I I I II II III I I I I I I I I I I I I I I 
Db 851 CCAAGTTT GCCT GCTCTACCAAGGCAATAGGAAGAACCAT CTT GAAAGAGAAT AT CAAGT 910 

Qy 421 AC GAGGTT GCC AT AT T T GT GC AT GGC CC GAC GACT GT C GAAT C ACAT GGCAATT ATT CAA 480 

I I I I II I I I I I I I I I I I I I I I II II I I I I I II II II II II II II I 
Db 911 ACGAAGTGGCCATTTTTGTCCATGGACCAACTACTGTGGAGTCGCACGGAAACTACTCCA 970 

Qy 481 CACAGATAGGGGCTACCCAAGCAGGAAGGTTCAGCATAACT CCATCGGCACCAT CCTACA 540 

I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 971 CAC AGGCT GGAGC CACT CAGGCAGGGAGATT CAGCAT CACT CCT GC GGCGC CTT C AT ACA 1030 

Qy 541 CGCTGAAGTTGGGTGAGTAT GGT GAGGTCACAGTTGACT GTGAGCCACGGT CAGGAATAG 600 

I II III I II II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 1031 CACTAAAGCTT GGAGAAT AT GGAGAGGT GAC AGT GGACT GTGAACCACGGT CAGGGATTG 1090 

Qy 601 ACACTAGCGCTTACTACGTTATGTCAGTGGGTGCGAAGTCCTTCTTGGTTCACCGAGAAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1091 ACAC CAAT GCAT ACT ACGT GAT GACT GT T GGAACAAAGAC GT T CT T GGT C CAT C GT GAGT 1150 

Qy 661 GGTT TAT GGAC CT GAACCT T CCATGGAGT AGC GCT GGAAGC ACAAC GT GGAGGAACC GGG 720 

I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 1151 GGTT CAT GGAC CT C AAC CT CCCT T GGAGCAGT GCT GGAAGC ACT GT GT GGAGGAACAGAG 1210 

Qy 721 AAACACT GAT GGAGT T T GAAGAACCT C AT GC CAC C AAACAAT CT GT C GT AGCT CTAGGGT 780 

III I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 

Db 1211 AGAC GT TAAT GGAGTT T GAGGAACC AC ACGC CAC GAAGC AGT CT GT GAT AGC ATT GG GCT 1270 

Qy 781 CGCAGGAAGGTGC CTT GCACCAAGCTCTGGCTGGAGCAATTCCTGTT GAGT TCTCAAGCA 840 
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Db 
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Db 
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Qy 
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Qy 

Db 

Qy 

Db 

Qy 

Db 



I II M II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I 
1271 CACAAGAGGGAGCTCTGCATCAAGCTTTGGCTGGAGCCATTCCTGTGGAATTTTCAAGCA 



1330 



841 



1331 



901 



AC ACT GT GAAGT T GAC AT C AGGACAT CT GAAGT GT AGGGT GAAGAT GGAGAAGT T GC AGC 
I I I I I I I I I I I II I I II II III I I I I I I I I I I I I I I I I I" I I I I II I I I I I I 
ACACTGTCAAGTTGACGTCGGGTCATTTGAAGTGTAGAGTGAAGATGGAAAAATTGCAGT 



900 



1390 



960 



TGAAGGGAACAACATATGGTGTATGCTCAAAAGCATTCAAATTCGCTAGGACTCCCGCTG 

I I I I I I I I I I I I I I I I I I II II I I I I I II I I I I I II I I I I I I I I I I I I 
1391 TGAAGGGAACAACCTATGGCGTCTGTTCAAAGGCTTTCAAGTTTCTTGGGACTCCCGCAG 1450 

961 ACACTGGTCATGGAACGGTGGTGCTGGAACTGCAGTATACCGGAAAAGACGGGCCTTGCA 1020 
I I I I I I I I I II II I I I I I I I I I I I I I I I I I I II II I II II I I I I I I 
1451 ACACAGGT CACGGCACT GTGGT GTT GGAATT GCAGTACACT GGCACGGATGGACCTT GTA 1510 

1021 AAGTGCCCATTTCTTCTGTGGCTTCCCTGAACGACCTTACACCCGTTGGAAGGCTGGTGA 1080 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
1511 AAGTTCCTATCTCGTCAGTGGCTTCATTGAACGACCTAACGCCAGTGGGCAGATTGGTCA 1570 

1081 CTGTGAATCCATTTGTGTCTGTGGCTACGGCCAACTCGAAGGTTTTGATTGAACTCGAAC 114 0 

I I I I II II I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1571 CTGTCAACCCTTTTGTTTCAGTGGCCACGGCCAACGCTAAGGTCCTGATTGAATTGGAAC 1630 

1141 CCCCGTTTAGTGACTCTTACATCGTGGTGGGGAGAGGAGAACAGCAGATAAACCACCACT 1200 

III Ml I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
1631 CACCCTTTGGAGACTCATACATAGTGGTGGGCAGAGGAGAACAACAGATCAATCACCATT 1690 

1201 GGC ACAAAT CT GGGAGC AGT AT T GGAAAGGCTTT CAC CACT AC ACT CAGAGGAGCT CAAC 1260 

I I I I I I I I I I I I I I I I I I I I I I II II II II II II I I I I I I I I I I II 
1691 GGC ACAAGT CT GGAAGCAGCAT T GG CAAAGCCTT T ACAAC C AC CCT CAAAGGAGC GC AGA 1750 

1261 GACTTGCAGCTCTTGGAGACACTGCCTGGGATTTTGGATCAGTCGGAGGGGTTTTCACCT 1320 

I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1751 GACTAGCCGCTCTAGGAGACACAGCTTGGGACTTTGGATCAGTTGGAGGGGTGTTCACCT 1810 

1321 CGGTAGGGAAAGCCATACACCAAGTTTTTGGAGGAGCCTTTAGATCACTCTTTGGAGGGA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1811 CAGTTGGGAAGGCTGTCCATCAAGTGTTCGGAGGAGCATTCCGCTCACTGTTCGGAGGCA 1870 

1381 TGTCCTGGATCACACAGGGGCTTCTGGGAGCTCTTCTGCTGTGGATGGGAATTAACGCCC 1440 

I I I I I I I I I I II II II I I I I I I I I I I I III I I I I I I I I I I II II II I 
1871 TGTCCTGGATAACGCAAGGATTGCTGGGGGCTCTCCTGTTGTGGATGGGCATCAATGCTC 1930 

1441 GTGACAGGTCAATTGCTATGACGTTCCTTGCGGTTGGAGGAGTCTTGCTCTTCCTTTCGG 1500 

I I I I I I I I I II III I I I I I I II II I I I I I I I I I I I I I I I I I I I I I II I 
1931 GTGATAGGTCCATAGCTCTCACGTTTCTCGCAGTTGGAGGAGTTCTGCTCTTCCTCTCCG 1990 

1501 TCAACGTCCATGCT 1514 

I I I I I I I I I I I I 
1991 TGAACGTGCATGCT 2004 



Search completed: June 12, 2006, 19:10:35 
Job time : 8801 sees 
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GO: 0019031; 
GO: 0005524; 
GO: 0008026; 
GO:0003725; 
GO: 0003724; 
GO:0003968; 
GO: 0004252; 
GO: 0005198; 
GO:0019079; 

IPR001410 
IPR011545 
IPR011999 
IPR001122 
IPR011492 
IPR000069 
IPR001157 
IPR000752 
IPR000487 
IPR000404 
IPR001528 
IPR000208 
IPR002535 
IPR000336 
IPR001650 
IPR001850 
IPR007095 
IPR007094 
IPR002877 
IPR011998 
IPR001680 
Flavi 



InterPro 
InterPro 
InterPro 
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InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
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InterPro 
Pfam 
Pfam 
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Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 



PF01003 
PF07652 
PF02832 
PF00869 
PF01004 
PF00948 
PF01005 
PF01002 
PF01350 
PF01349 
PF00972 
PF01570 
PF01728 
PF00271 
PF00949 



C: viral envelope; IEA. 
F: ATP binding; IEA. 

F: ATP-dependent helicase activity; IEA. 
F: double-stranded RNA binding; IEA. 
F:RNA helicase activity; IEA. 
F: RNA-directed RNA polymerase activity; IEA. 
F: serine-type endopeptidase activity; IEA. 
F: structural molecule activity; IEA. 
P: viral genome replication; IEA. 
DEAD. 

DEAD/ DEAH_N . 
Flav_glyE_cen_dm. 
Flavi_capsidC . 
Flavi_DEAD. 
Flavi_M. 
Flavi_NSl. 
Flayi_NS2A. 
Flavi_NS2B. 
Flavi_NS4A. 
Flavi_NS4B. 
Flavi_NS5. 
Flavi_propep. 
Flv_glyE_JEg-like . 
Helicase_C. 
Peptidase_S7. 
RNA_pol_DS_PS . 
RNA_pol_PSvir . 
RrmJFts J_mtf rase . 
Vrl_glyE_cen_dim. 
WD40. 

caps id; 1. 
Flavi_DEAD; 1. 
Flavi_glycop_C; 1. 
Flavi_glycoprot ; 1 . 
Flavi_M; 1. 
Flavi_NSl; 1. 
Flavi_NS2A; 1. 
Flavi_NS2B; 1. 
Flavi_NS4A; 1. 
Flavi_NS4B; 1. 
Flavi_NS5; 1. 
Flavi_propep; 1 . 
FtsJ; 1. 
Helicase_C; 1. 
Peptidase_S7; 1. 



DR 


ProDom; 


PD001496; 


Flavi NS1; 


1. 


DR 


SMART; 


SM00487; 


DEXDc; 1. 




DR 


SMART; 


SM00490; 


HELICc; 1. 




DR 


PROSITE 


; PS00678; 


WD_REPEATS_ 


_1; UNKNOWN^ 


KW 


Polyprotein. 








FT 


CHAIN 


1 




123 


capsid. 


FT 


CHAIN 


124 




215 


prM. 


FT 


CHAIN 


216 




290 


M. 


FT 


CHAIN 


291 




791 


envelope. 


FT 


CHAIN 


792 




1144 


NS1. 


FT 


CHAIN 


1145 




1374 


NS2A. 


FT 


CHAIN 


1375 




1505 


NS2B. 



FT 


CHAIN 


1506 


2124 


NS3. 


FT 


CHAIN 


2125 


2273 


NS4A. 


FT 


CHAIN 


2274 


2528 


NS4B . 


FT 


CHAIN 


2529 


3433 


NS5. 


SQ 


SEQUENCE 


3433 


AA; 381124 


MW; C302F24541A66BC8 



Query Match 95.9%; Score 2531; DB 2; Length 3433; 

Best Local Similarity 95.4%; Pred. No. 3.8e-183; 

Matches 478; Conservative 14; Mismatches 9; Indels 0; Gaps 0; 

Qy 1 FNCLGMSNRDFLEGVSGATVAADLVXEGDSCWIMSKDK^ 60 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
Db 291 FNCLGMSNRDFLEGVSGATWVT>LVLEGDSCWIMSKDKP 350 

Qy 61 YLASVSDLSTKAACPTMGEIAHNEKRM)PAEVCKQGvArDRGWGNGCGLFGKGSIDTCAKFA 120 

I I I : I I I I I I I I I I I I I II I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 351 YLATVSDLSTFCAACPTMGEAHNDKRADPAFVCRQGVVDRGWGNGCGLFGKGSIDTCAKFA 410 

Qy 121 CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 180 

1:11111 I I I I I I I I I I I I II I I I II I I I I I I I I I I I II I I I I I I I I I : II I I I I I I 
Db 411 CSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPAAPSYTLKL 470 

Qy 181 GEYGEWVDCEPRSGIDTSAYYV^SVGAKSFLVTiREWFMDLNLPWSSAGSTTWRNRETLM 240 

! I I I I I I I I I I I I I I I I I : I I I M : I I I : I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 471 GEYGEVTVDCEPRSGI DTNAYYVMTVGTKTF.LVHREWFMDLNLPWS SAGSTVWRNRETLM 530 

Qy 241 EFEEPHATKQSWALGSQEGALHQALAGAJPVT2FSSNTW 300 

I I I I I I I I II I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 531 EFEEPHATKQSVIALGSQEGALHQALAGAlPvTlFSSNTvl<LTSGHLKCRVT^MEKLQLKGT 590 

Qy 301 TYGVCSKAFKFARTPADTGHGTWLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 591 TYGVCSKAFKFLGTPADTGHGTWLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 650 

Qy 361 FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 420 

I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I 
Db 651 FVSVATANAKVljIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLKGAQRLAA 710 

Qy 421 LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 480 

I I II I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 711 LGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 770 

Qy 481 IAMTFLAVGGVXLFLSVNVHA 501 

I I : I I I I I I I I I I I I I I I I I I 
Db 771 IALTFLAVGGVLLFLSVNVHA 791 



RESULT 13 
Q8JU42WNV 
ID 
AC 
DT 
DT 
DT 
DE 
OS 



PRELIMINARY; 



PRT; 3433 AA. 



OC 



Q8JU42_WNV 
Q8JU42; 
01-OCT-2002, 
01-OCT-2002, 
07-FEB-2006, 
Polyprotein. 
West Nile virus (WN) . 
Viruses; ssRNA positive-strand viruses , 



integrated into UniProtKB/TrEMBL. 
sequence version 1. 
entry version 14. 



no DNA stage; Flaviviridae; 



OC Flavi virus; Japanese encephalitis virus group. 

OX NCBI_TaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RX MEDLINE=22089180; PubMed=12093177 ; DOI=10 . 1006/viro . 2002 . 1449 ; 

RA Lanciotti R.S., Ebel G.D., Deubel V., Kerst A. J., Murri S., Meyer R. , 

RA Bowen M. , McKinney N., Morrill W.E., Crabtree M.B., Kramer L.D., 

RA Roehrig J.T. ; 

RT "Complete genome sequences and phylogenetic analysis of West Nile 

RT virus strains isolated from the United States , Europe, and the Middle 

RT East."; 

RL Virology 298:96-105(2002). 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; AF404757; AAM81753.1; -; Genomic_RNA. 

DR HSSP; Q88653; 1L9K. 

DR SMR; Q8JU42; 25-97. 

DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO: 0019028; C: viral capsid; IEA. 

DR GO; GO: 0019031; C: viral envelope; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0008026; F: ATP-dependent helicase activity; IEA. 

DR GO; GO:0003725; F: double-stranded RNA binding; IEA. ■ * 

DR GO; GO: 0003724; F: RNA helicase activity; IEA. 

DR GO; GO: 0003968; F: RNA-directed RNA polymerase activity; IEA. 

DR GO; GO: 0004252; F: serine-type endopeptidase activity; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0019079; P: viral genome replication; IEA. 

DR InterPro; IPR001410; DEAD. 

DR InterPro; IPR011545; DEAD/DEAH_N. 

DR InterPro; IPR011999; Flav_glyE_cen_dm. 

DR InterPro; IPR001122; Flavi_capsidC . 

DR InterPro; IPR011492; Flavi_DEAD. 

DR InterPro; IPR000069; FlaviJVL 

DR InterPro; IPR001157; Flavi_NSl. 

DR InterPro; IPR000752; Flavi_NS2A. 

DR InterPro; IPR000487; Flavi_NS2B. 

DR InterPro; IPR000404; Flavi_NS4A. 

DR InterPro; IPR001528; Flavi_NS4B. 

DR InterPro; IPR000208; Flavi_NS5. 

DR InterPro; IPR002535; Flavi_propep . 

DR InterPro; IPR000336; Flv_glyE_Ig-like . 

DR InterPro; IPR001650; Helicase_C. 

DR InterPro; IPR001850; Peptidase_S7 . 

DR InterPro; IPR007095; RNA_pol_DS_PS . 

DR InterPro; IPR007094; RNA_pol_PSvir . 

DR InterPro; IPR002877; RrmJFts J_mtf rase . 

DR InterPro; IPR011998; Vrl_glyE_cen_dim. 

DR InterPro; I PRO 01 680; WD40. 

DR Pfam; PF01003; Flavi_capsid; 1. 

DR Pfam; PF07652; Flavi_DEAD; 1. 

DR Pfam; PF02832; Flavi_glycop_C; 1. 

DR Pfam; PF00869; Flavi_glycoprot ; 1. 

DR Pfam; PF01004; Flavi_M; 1. 

DR Pfam; PF00948; Flavi NS1; 1. 



DR Pfam; PF01005; Flavi_NS2A; 1. 

DR Pfam; PF01002; Flavi_NS2B; 1. 

DR Pfam; PF01350; Flavi_NS4A; 1. 

DR Pfam; PF01349; Flavi_NS4B; 1. 

DR Pfam; PF00972; Flavi_NS5; 1. 

DR Pfam; PF01570; Flavi_propep; 1. 

DR Pfam; PF01728; FtsJ; 1. 

DR Pfam; PF00271; Helicase_C; 1. 

DR Pfam; PF00949; Peptidase_S7 ; 1. 

DR ProDom; PD001496; Flavi_NSl; 1. 

DR SMART; SM00487; DEXDc; 1. 

DR SMART; SM00490; HELICc; 1. 

DR PROSITE; PS00678; WD_REPEATS_1 ; UNKNOWN_l . 

. KW Polyprotein. 

SQ SEQUENCE 3433 AA; 381210 MW; 1DFFCCDB2174B7EE CRC64; 

Query Match 95.9%; Score 2531; DB 2; Length 3433; 

Best Local Similarity 95.4%; Pred. No. 3.8e-183; 

Matches 478 ; Conservative 14 ; Mismatches 9; Indels 0; Gaps 0; 

Qy 1 FNCLGMSNRDFLEGVSGATWVTDLVT^EGDSCVTIMSK 60 

I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
Db 291 FNCLGMSNRDFLEGVSGATWVDLVLEGDS CVT IMS KDKPT I DVKMMNMEAANLAEVRS YC 350 

Qy 61 YLASVSDLSTKAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 120 

I I I : I I I I I I II I I I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 351 YLATVSDLSTKAACPTMGElAHNDKRADPAFVCRQGVVpRGWGNGCGLFGKGSIDTCAKFA 410 

Qy 121 CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 180 

1:11111 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I 
Db 411 CSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPAAPSYTLKL 470 

Qy 181 GEYGEWVI)CEPRSGIDTSAYYViyiSVGAKSFLVlIREWmDLNLPWSSAGSTTWRNRETLM 240 

I I I I I I I I I I I I I I I I I I : I I I I I : I I I : I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 471 GEYGEWV1DCEPRSGIDTNAYYVMTVGTKTFLVHREWFMDLNLPWSSAGSTWRNRETLM 530 

Qy 241 EFEEPHATKQSWALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 300 

I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 531 EFEEPHATKQSVIALGSQEGALHQA^GAIPVTIFSSNTVT^LTSGHLKCRVKMEKLQLKGT 590 

Qy 301 TYGVCSKAFKFARTPADTGHGTWLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I 
Db 591 TYGVCSKAFKFLGTPADTGHGTWLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 650 

Qy 361 EVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 420 

I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I 
Db 651 BVSVATANAKVLIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLKGAQRLAA 710 

Qy 421 LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 480 

I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 711 LGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 770 

Qy 481 IAMTFLAVGGVLLFLSVNVHA 501 

I I : I I I I I I I I I I I I I I I I I I 
Db 771 IALTFLAVGGVLLFLSVNVHA 791 



RESULT 14 
Q9EA21_WNV 

ID Q9EA21_WNV PRELIMINARY; PRT; 3433 AA. 

AC Q9EA21; 

DT 01-MAR-2001, integrated into UniProtKB/TrEMBL . 

DT 01-MAR-2001, sequence version 1. 

DT 07-FEB-2006, entry version 20. 

DE Polyprotein. 

OS West Nile virus (WN) . 

OC Viruses; ssRNA positive-strand viruses> no DNA stage; Flavi viridae; 

OC Flavi virus; Japanese encephalitis virus group. 

OX NCBI_TaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=RO97-50; 

RX MEDLINE=20014331; PubMed=10548295; 

RA Savage H.M., Ceianu C, Nicolescu G., Karabatsos N., Lanciotti R. , 

RA Vladimirescu A. , Laiv L. , Ungureanu A. , Romanca C, Tsai T.F.; 

RT "Entomologic and avian investigations of an epidemic of West Nile 

RT fever in Romania in 1996, with serologic and molecular 

RT characterization of a virus isolate from mosquitoes."; 

RL Am. J. Trop. Med. Hyg. 61:600-611(1999). 

RN [2] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=RO97-50; 

RA Bowen M. , Meyer R.F., McKinney N. , Morrill W., Lanciotti R. ; 

RL Submitted (APR-2000) to the EMBL/ GenBank/DDBJ databases. 

cc : 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; AF260969; AAG02040.1; -; Genomic_RNA. 

DR HSSP; Q88653; 1L9K. 

DR SMR; Q9EA21; 25-97. 

DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO: 0019028; C: viral capsid; IEA. 

DR GO; GO: 0019031; C: viral envelope; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO:0008026; F: ATP-dependent helicase activity; IEA. 

DR GO; GO:0003725; F: double-stranded RNA binding; IEA. 

DR GO; GO: 0003724; F: RNA helicase activity; IEA. 

DR GO; GO:0003968; F : RNA-di rected RNA polymerase activity; IEA. 

DR GO; GO: 0004252; F: serine-type endopeptidase activity; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0019079; P: viral genome replication; IEA. 

DR InterPro; IPR001410; DEAD. 

DR InterPro; IPR011545; DEAD / DEAH__N . 

DR InterPro; IPR011999; Flav_glyE_cen_dm. 

DR InterPro; IPR001122; Flavi_capsidC. 

DR InterPro; IPR011492; Flavi_DEAD. 

DR InterPro; IPR000069; Flavi_M. 

DR InterPro; IPR001157; Flavi_NSl. 

DR InterPro; IPR000752; Flavi_NS2A. 

DR InterPro; IPR000487; Flavi_NS2B. 

DR InterPro; IPR000404; Flavi_NS4A. 

DR InterPro; IPR001528; Flavi_NS4B. 

DR InterPro; IPR000208; Flavi NS5. 



DR InterPro; IPR002535; Flavi_propep . 

DR InterPro; IPR000336; Flv_glyE_Ig-like . 

DR InterPro; IPR001650; Helicase_C. 

DR InterPro; IPR001850; Peptidase_S7 . 

DR InterPro; IPR007095; RNA_pol_DS_PS . 

DR InterPro; IPR007094; RNA_pol_PSvir . 

DR InterPro; IPR002877; RrmJFts J_mtf rase . 

DR InterPro; IPR011998; Vrl_glyE_cen_dim. 

DR InterPro; IPR001680; WD40. 

DR Pfam; PF01003; Flavi_capsid; 1. 

DR Pfam; PF07652; Flavi_DEAD; 1. 

DR Pfam; PF02832; Flavi_glycop_C; 1. 

DR Pfam; PF00869; Flavi_glycoprot ; 1. 

DR Pfam; PF01004; FlaviJVI; 1. 

DR Pfam; PF00948; Flavi_NSl; 1. 

DR Pfam; PF01005; Flavi_NS2A; 1. 

DR Pfam; PF01002; Flavi_NS2B; 1. 

DR Pfam; PF01350; Flavi_NS4A; 1. 

DR Pfam; PF01349; Flavi_NS4B; 1. 

DR Pfam; PF00972; Flavi_NS5; 1. 

DR Pfam; PF01570; Flavi_propep; 1. 

DR Pfam; PF01728; FtsJ; 1. 

DR Pfam; PF00271; Helicase_C; 1. 

DR Pfam; PF00949; Peptidase_S7 ; 1. 

DR ProDom; PD001496; Flavi_NSl; 1. 

DR SMART; SM00487; DEXDc; 1. 

DR SMART; SM00490; HELICc; 1. 

DR PROSITE; PS00678; WD_REPEATS_1 ; UNKNOWN_ 

KW Polyprotein. 



FT 


CHAIN 


1 


123 


nucleocapsid protein C. 


FT 


CHAIN 


124 


215 


pre-membrane protein prM. 


FT 


CHAIN 


216 


290 


membrane protein M. 


FT 


CHAIN 


291 


791 


envelope glycoprotein E. 


FT 


CHAIN 


792 


1143 


non-structural protein 1 NS1. 


FT 


CHAIN 


1144 


1374 


non-structural protein 2A NS2A. 


FT 


CHAIN 


1375 


1505 


non-structural protein 2B NS2B. 


FT 


CHAIN 


1506 


2124 


non-structural protein 3 NS3 . 


FT 


CHAIN 


2125 


2273 


non-structural protein 4A NS4A. 


FT 


CHAIN 


2274 


2528 


non-structural protein NS4B. 


FT 


CHAIN 


2529 


3433 


non-structural protein NS5. 


SQ 


SEQUENCE 


3433 


AA; 381256 


MW; 4695F8911670DF2A CRC64; 



Query Match 95.9%; Score 2531; DB 2; Length 3433; 

Best Local Similarity 95.4%; Pred. No. 3.8e-183; 

Matches 478; Conservative 14; Mismatches 9; Indels 0; 



Gaps 



0; 



Qy 



Db 



1 FNCLGMSNRDFLEGVSGATWVTDLVTjEGDSCVTIMSKDKPTIDVKMMNMEAANLADW 60 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
291 FNCLGMSNRDFLEGVSGATWV1)LVLEGDSCWIMSKDKPTIDvT<MMNMEAANLAEW 350 



Qy 

Db 



61 YLASVSDLSTKAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 120 
I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
351 YLATVSDLSTKAACPTMGEAHNDKRADPAFVCRQGWDRGWGNGCGLFGKGSIDTCAKFA 410 



Qy 

Db 



121 
411 



CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 180 
I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I : I I I I I I I I 
CSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPAAPSYTLKL 470 



Qy 

Db 



181 
471 



GEYGEVTVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLNLPWSSAGSTTWRNRETLM 240 
I I I I I I I I I I I I I I I I I I : I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GEYGEVTVDCEPRSGIDTNAYYVMTVGTKTFLVHREWFMDLNLPWSSAGSTVWRNRETLM 530 



Qy 241 EFEEPHATKQSWALGSQEGALHQALAGAIPVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 300 

I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II 
Db 531 EFEEPHATKQSVIALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 590 

Qy 301 T YGVCS KAFKFART PADTGHGTWLELQ YTGKDGPCKVP I S S VAS LNDLTPVGRLVTVNP 360 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 591 TYGVCSKAFKFLGTPADTGHGTWLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 650 

Qy 361 FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 420 

I I I I I I I I : I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I : I I I I I I I 
Db 651 FVSVATANAKVLIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLKGAQRLAA 710 

Qy 421 LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 480 

M I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 711 LGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRSLFGGMSWITQGLLGALLLWMGIN7VRDRS 770 

Qy 481 IAMTFLAVGGVLLFLSVNVHA 501 

I I : I I I I I I I I I I I I I I I I I I 
Db 771 IALTFLAVGGVLLFLSVNVHA 791 



RESULT 15 
Q9WHD2_WNV 

ID Q9WHD2_WNV PRELIMINARY; PRT; 773 AA. 

AC Q9WHD2; 

DT 01-NOV-1999, integrated into UniProtKB/TrEMBL . 

DT 01-NOV-1999, sequence version 1. 

DT 07-FEB-2006, entry version 24. 

DE Polyprotein (Fragment) . 

OS West Nile virus (WN) . 

OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 

OC Flavi virus; Japanese encephalitis virus group, 

OX NCBI_TaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=96-1030; 

RX MEDLINE=98407299; PubMed=9737281 ; DOI=10 . 1016/S0140-6736 ( 98) 03538-7; 

RA Tsai T.F., Popovici F. , Cernescu C, Campbell G.L., Nedelcu N.I.; 

RT "West Nile encephalitis epidemic in southeastern Romania."; 

RL Lancet 352:767-771(1998). 

RN [2] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=96-1030; 

RA Lanciotti R.L., Ludwig M.L., Savage H.M.; 

RL Submitted (FEB-1999) to the EMBL/ GenBank/ DDB J databases. 

cc 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; AF130363; AAD28624.1; -; Genomic_RNA. 

DR HSSP; Q88653; 10KE. 

DR SMR; Q9WHD2; 1-72. 



DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO: 0019028; C: viral capsid; IEA. 

DR GO; GO: 0019031; C: viral envelope; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0019058; P: viral infectious cycle; IEA. 

DR InterPro; IPR011999; Flav_glyE_cen_dm. 

DR InterPro; IPR001122; Flavi_capsidC . 

DR InterPro; IPR000069; Flavi_M. 

DR InterPro; IPR002535; Flavi_propep . 

DR InterPro; IPR000336; Flv_glyE_Ig-like . 

DR InterPro; IPR011998; Vrl_glyE_cen_dim. 

DR Pfam; PF01003; Flavi_capsid; 1. 

DR Pfam; PF02832; Flavi_glycop_C; 1. 

DR Pfam; PF00869; Flavi_glycoprot; 1. 

DR Pfam; PF01004; Flavi_M; 1. 

DR Pfam; PF01570; Flavi_propep; 1. 

KW Polyprotein. 

FT CHAIN <1 88 capsid protein. 

FT CHAIN 89 265 p re-membrane /membrane protein. 

FT CHAIN 266 766 envelope glycoprotein. 

FT NONJTER 11 

FT NONJTER 773 773 

SQ SEQUENCE 773 AA; 83362 MW; 2960B1E9AF064BF6 CRC64 ; 

Query Match 95.9%; Score 2529; DB 2; Length 773; 

Best Local Similarity 95.4%; Pred. No. 7.3e-184; 

Matches 478; Conservative 13; Mismatches 10; Indels 0; Gaps 0; 

Qy 1 FNCLGMSNRDFLEGVSGATWvT)LVTjEGDSCVTIMSK^ 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I : I I I I I 
Db 266 FNCLGMSNRDFLEGVSGATWvT)LVLEGDSCVTIMSKDKPTIDV^^ 325 

Qy 61 YLASVSDLSTKAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 120 

I II : I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II 
Db 326 YLATVSDLSTKAACPTMGEAHNDKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 385 

Qy 121 CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 180 

I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I 
Db 386 CSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYPTQIGATQAGRFSITPAAPSYTLKL 445 

Qy 181 GEYGEWVT)CEPRSGIDTSAYYViyiSVGAKSFLWREWFMDLNLPWSSAGSTTWRNRETLM 240 

I I I I I I I I I I I I I I I I I I : I I I I I : I I I : I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 446 GEYGEWvT>CEPRSGIDTNAYYvTfTVGTKTFLVHREWFMDLNLPWSSAGSTWRNRETLM 505 

Qy 241 EFEEPHATKQSWALGSQEGALHQALAGAI PVEFS SNTVKLTSGHLKCRVKMEKLQLKGT 300 

I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 506 EFEEPHATKQSVIALGSQEGALHQALAGAI PVEFS SNTVKLTSGHLKCRVKMEKLQLKGT 565 

Qy 301 TYGVCSKAFKFARTPADTGHGTVVLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 566 TYGVCSKAFKFLGTPADTGHGTWLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 625 

Qy 361 FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 420 

I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I 
Db 626 FVSVATANAKVLIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLKGAQRLAA 685 



Qy 



421 



LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 480 



Db 686 LGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 745 

Qy 481 IAMTFLAVGGVLLFLSVNVHA 501 

I I : I I I I I I I I I I I I I I I I I I 
Db 746 IALTFLAVGGVLLFLSVNVHA 766 



Search completed: June 10, 2006, 02:43:07 
Job time : 303 sees 



f 



RC 


STRAIN=ArB3573/82 ; 






RA 


Borisevich V.G., 


, Seregin 


A. V., Yamshchikov V.F.; 


RT 


"Genetic 


determinants of 


West Nile virus pathogenicity."; 


RL 
CC 
CC 


Submitted 


. (DEC-2005) to 


the 


EMBL/ GenBank/DDBJ databases. 


Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 


cc 
cc 

DR 


Distributed under the Creative Commons Attribution-NoDerivs License 


EMBL; DQ318020; 


ABC49717 


.1; 


-; mRNA. 


KW 


Polyprotein; Signal. 






FT 


STOMAL 


106 


123 




Potential . 


FT 


SIGNAL 


290 


792 




Potential . 


FT 


SIGNAL 


2251 


2273 




Potential . 


FT 


CHAIN 


1 


105 




C protein. 


FT 


CHAIN 


106 


290 




. prM protein. 


FT 


CHAIN 


124 


215 




cleaved amino terminal prM fragment. 


FT 


CHAIN 


216 


290 




M protein. 


FT 


CHAIN 


291 


791 




E protein. 


FT 


CHAIN 


792 


1143 




NS1 protein. 


FT 


CHAIN 


1144 


1374 




NS2A protein. 


FT 


CHAIN 


1375 


1505 




NS2B protein. 


FT 


CHAIN 


1506 


2124 




NS3 protein. 


FT 


CHAIN 


2125 


2273 




NS4A protein. 


FT 


CHAIN 


2274 


2529 




NS4B protein. 


FT 


CHAIN 


2530 


3434 




NS5 protein. 


SQ 


SEQUENCE 


3434 


AA; 380337 


MW; DF4C043FCA4 F25DE CRC64 ; 



Query Match 98.5%; 
Best Local Similarity 98.8%; 
Matches 495; Conservative 



Score 2599; DB 2; Length 3434; 
Pred. No. 2.5e-188; 
3; Mismatches 3; Indels 0; 



Gaps 



0; 



Qy 


1 


Db 


291 


Qy 


61 


Db 


351 


Qy 


121 


Db 


411 


Qy 


181 


Db 


471 


Qy 


241 


Db 


531 


Qy 


301 


Db 


591 



FNCLGMSNRDFLEGVSGATWvTLVXEGDSCVTIMSKDKPTIDV 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FNCLGMSfoRDFLEGVSGATWVTJLvT^EGDSCvTTIMSK 350 

YLASVSDLSTKAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 120 

I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
YLASVSDLSTRAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 410 

CTTKATGWIIQKENIKYEYAIEVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I 
CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGDYSTQIGATQAGRFSITPSAPSYTLKL 470 

GEYGEWVT)CEPRSGIDTSAYYViyiSVGAKSFLvlIREWFMDLNLPWSSAGSTTWRNRETLM 24 0 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

geygewvt)ceprsgidtsayyvl^svgaksflvlirewfmdlnlpwssagsttwrnretlm 530 
efeephatkqswalgsqegalhqalagaipvt;fssntvt<ltsghlkcrvkmeklqlkgt 300 

I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
EFEEPHATKRSWALGSQEGALHQALAGAIPV^FSSNTV^LTSGHLKCRVKMEKLQLKGT 590 

TYGVCSKAFKFARTPADTGHGTVVLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 360 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

TYGVCSKAFKFAGTPADTGHGTWLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 650 



Qy 



361 



FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 420 
I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 651 FVSVATANSKVLIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 710 

Qy 421 LGDTAWDFGSVGGVFTSVGKAIHQWGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 480 

I I I I I I I II I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 711 LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 770 

Qy 481 IAMTFLAVGGVLLFLSVNVHA 501 

I I I I I I I I I I I I I I I I I I I I I 
Db 771 IAMTFLAVGGVLLFLSVNVHA 791 

RESULT 4 
Q5MXE3_WNV 

ID Q5MXE3_WNV PRELIMINARY; PRT; 3430 AA. 

AC Q5MXE3; 

DT 01-FEB-2005, integrated into UniProtKB/TrEMBL . 

DT 01-FEB-2005, sequence version 1. 

DT 07-FEB-2006, entry version 4. 

DE Polyprotein. 

OS West Nile virus (WN) . 

OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 

OC Flavi virus; Japanese encephalitis virus group. 

OX NCBI_TaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=B956; 

RX PubMed=15527855; DOI=10 . 1016/ j . virol . 2004 . 09 . 014 ; 

RA Yamshchikov G., Borisevich V., Seregin A. , Chaporgina E., Mishina M. , 

RA Mishin V., Wai Kwok C, Yamshchikov V.; 

RT "An attenuated West Nile prototype virus is highly immunogenic and 

RT protects against the deadly NY99 strain: a candidate for live WN 

RT vaccine development."; 

RL Virology 330:304-312(2004). 

RN [2] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=B956; 

RA Borisevich V.G., Yamshchikov V.F.; 

RT "Molecular basis of attenuation of the West Nile virus prototype 

RT strain B956."; 

RL Submitted (JAN-2004) to the EMBL/ GenBank/DDB J databases. 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; AY532665; AAT02759.1; -; Genomic_RNA. 

DR SMR; Q5MXE3; 25-97. 

DR GO; GO:0016021; C:integral to membrane; IEA. 

DR GO; GO: 0019028; C: viral capsid; IEA. 

DR GO; GO: 0019031; C: viral envelope; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0008026; F: ATP-dependent helicase activity; IEA. 

DR GO; GO: 0003725; F: double-stranded RNA binding; IEA. 

DR GO; GO: 0003724; F: RNA helicase activity; IEA. 

DR GO; GO:0003968; F: RNA-directed RNA polymerase activity; IEA. 

DR GO; GO:0004252; F: serine-type endopeptidase activity; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0019079; P: viral genome replication; IEA. 



DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 
SQ 



InterPro; IPR001410 
InterPro; IPR011545 
InterPro; IPR011999 
InterPro; IPR001122 
InterPro; IPR011492 
InterPro; IPR000069 
InterPro; IPR001157 
InterPro; IPR000752 
InterPro; IPR000487 
InterPro; IPR000404 
InterPro; IPR001528 
InterPro; IPR000208 
InterPro; IPR002535 
InterPro; IPR000336 
InterPro; IPR001650 
InterPro; IPR001850 
InterPro; IPR007095 
InterPro; IPR007094 
InterPro; IPR002877 
InterPro; IPR011998 
InterPro; IPR001680 
Pf am 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 
Pfam 

ProDom; PD001496; Flavi_NSl; 1. 
SMART; SM00487; DEXDc; 1. 
SMART; SM00490; HELICc; 1. 

PROSITE; PS00678; WD_REPEATS_1 ; UNKN0WN_1 
Polyprotein. 

SEQUENCE 3430 AA; 379894 MW; 



PF01003, 


• Flavi_ 


capsid; 


1. 


PF07652, 


Flavi_ 


_DEAD; 1. 




PF02832, 


Flavi_ 


glycop C 


; 


PF00869, 


Flavi_ 


_glycoprot ; 


PF01004, 


Flavi_ 


M; 1. 




PF00948, 


• Flavi_ 


NS1; 1. 




PF01005, 


Flavi_ 


_NS2A; 1. 




PF01002, 


Flavi_ 


NS2B; 1. 




PF01350, 


Flavi_ 


NS4A; 1. 




PF01349, 


Flavi_ 


_NS4B; 1. 




PF00972, 


Flavi_ 


_NS5; 1. 




PF01570, 


Flavi_ 


propep; 


l. 


PF01728, 


; FtsJ;" 


1. 




PF00271, 


• Helicase_C; 1. 




PF00949, 


• Peptidase_S7; 


l. 



DEAD. 

DEAD/ DEAH_N . 

Fl a v_gl y E_cen_dm . 

Flavi_capsidC . 

Flavi_DEAD. 

Flavi_M. 

Flavi_NSl. 

Flavi_NS2A. 

Flavi_NS2B. 

Flavi_NS4A. 

Flavi_NS4B. 

Flavi_NS5. 

Flavi_propep . 

Flv_glyE_Ig-like . 

Helicase_C. 

Peptidase_S7 . 

RNA_pol_DS_PS . 

RNA_pol_PSvir . 

RrmJFts J_mtf rase . 

Vrl_glyE_cen_dim. 

WD40. 



1. 



6298C302480200D8 CRC64 ; 



Query Match 97.6%; 
Best Local Similarity 98.2%; 
Matches 492; Conservative 



Score 2575; DB 2; Length 3430; 
Pred. No. 1.7e-186; 
3; Mismatches 2; Indels 4; 



Gaps 



1; 



Qy 

Db 



1 FNCLGMSNRDFLEGVSGATWVI)LVLEGDSCWIMSKDKPTIDvl<MMNMEAANL^ 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I : I I I I I I I I I I I I II I I I I I I I I I I I I I 
291 FNCLGMSNRDFLEGVSGATWVT>LVXEGDSCWLMSKDKPTIDVKMMNMEA 350 



Qy 

Db 



61 YLASVSDLSTKAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 120 

I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I 
351 YLASVSDLSTRAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 410 



Qy 



121 CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 180 
I I I I I I I I I I I I I I II II I I I I I I I I I I I I I M : I I I I I I I I I I I I I I I I I I I I II 



Db 


411 


CTTKATGWIIQKENIKYEVAIFVHGPTTVESHG KI GATQAGRFS I T P SAP S YTLKL 


466 


Qy 


181 


GEYGEVTVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLNLPWSSAGSTTWRNRETLM 


240 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 




Db 


467 


GEYGEWVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLNLPWSSAGSTTWRNRETLM 


526 


Qy 


241 


EFEEPHATKQSWALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 


300 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


527 


EFEEPHATKQSWALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 


586 


Qv 


301 


TYGVCSKAFKFARTPADTGHGTWLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 


360 






1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


587 


TYGVCSKAFKFARTPADTGHGTWLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 


646 


Qy 


361 


FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 


420 




1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


647 


FVSVATANSKVLIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 


706 


Qv 


421 


LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 


480 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


707 


LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 


766 


Qy 


481 


IAMTFLAVGGVLLFLSVNVHA 501 








1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


767 


IAMTFLAVGGVLLFLSVNVHA 787 





RESULT 5 
Q2PMF5_WNV 
ID 
AC 



PRELIMINARY; 



PRT; 3430 AA. 



DT 
DT 
DT 
DE 
OS 
OC 
OC 
OX 
RN 
RP 
RC 
RA 
RT 
RL 
CC 
CC 
CC 
CC 
DR 
KW 
FT 
FT 
FT 
FT 
FT 
FT 



Q2PMF5_WNV 
Q2PMF5; 

24-JAN-2006, integrated into UniProtKB/TrEMBL . 
24-JAN-2006, sequence version 1. 
07-FEB-2006, entry version 2. 
Polyprotein precursor. 
West Nile virus (WN) . 

Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 

Flavi virus; Japanese encephalitis virus group. 

NCBI_TaxID=11082; 

[1] 

NUCLEOTIDE SEQUENCE. % 
STRAIN=ArD76104; 

Borisevich V.G., Seregin A.V., Yamshchikov V.F.; 
"Genetic determinants of West Nile virus pathogenicity."; 
Submitted (DEC-2005) to the EMBL/ GenBank/DDBJ databases. 

Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 
Distributed under the Creative Commons Attribution-NoDerivs License 

EMBL; DQ318019; ABC49716.1; -; mRNA. 
Polyprotein; Signal . 



SIGNAL 


106 


123 


Potential. 


SIGNAL 


275 


290 


Potential . 


SIGNAL 


764 


787 


Potential . 


SIGNAL 


2247 


2269 


Potential . 


CHAIN 


1 


105 


C protein. 


CHAIN 


124 


215 


cleaved amino 



FT 


CHAIN 


124 


290 


prM protein. 


FT 


CHAIN 


216 


290 


M protein. 


FT 


CHAIN 


291 


787 


E protein. 


FT 


CHAIN 


788 


1139 


NS1 protein. 


FT 


CHAIN 


1140 


1370 


NS2A protein. 


FT 


CHAIN 


1371 


1501 


NS2B protein. 


FT 


CHAIN 


1502 


2120 


NS3 protein. 


FT 


CHAIN 


2121 


2269 


NS4A protein. 


FT 


CHAIN 


2270 


2525 


NS4B protein. 


FT 


CHAIN 


2526 


3430 


NS5 protein. 


SQ 


SEQUENCE 


3430 


AA; 379866 


MW; B03CBB31C86FD33B 



Query Match 97.5%; Score 2573; DB 2; Length 3430; 

Best Local Similarity 98.2%; Pred. No. 2.4e-186; 

Matches 492; Conservative 3; Mismatches 2; Indels 4; Gaps 1; 



yy 


i 


™rT.GMSNRDFT,EGVSGATWVT)LvTjEGDSCVTIMSKDKPTIDv^ 


60 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


291 


FNCLGMSNRDFLEGVSGATWVT)LvXEGDSCVTIMSKDKPTIDVT^^ 


350 




fil 
\j j. 


YT.ASV^HT.^TK'AArPTMGFAHMKKRAnPAI^rKOGVVDRGWGNGCGLFGKGSIDTCAKFA 


120 






1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


351 


YLASVSDLSTRAACPTMGE1AHNEKRADPAFVCKQGVVDRGWGNGCGLFGKGSIDTCAKFA 


410 




1 ?1 


CTTKATGWT TOKEN TKYF.VAT FVHGPTTVESHGNYSTOIGATOAGRFSITPSAPSYTLKL 


180 






1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


411 


CTTKATGWIIQKENIKYEVAIFVHGPTTVESHG KIGATQAGRFSITPSAPSYTLKL 


466 


Qy 


181 


GEYGEWVDCEPRSGIDTSAYYVMSVGAKSFLVlIREWmDLNLPWSSAGSTTWRNRETLM 


240 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 

1 1 II 1 II II 1 II 1 II 1 II II 1 II 1 1 1 1 M M 1 M M M 1 1 1 M 1 II II 1 II 1 II M 1 1 1 ' 




Db 


467 


GEYGEVTVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLNLPWSSAGSTTWRNRETLV 


526 


Qy 


241 


EFEEPIlATKQSWALGSQEGALHQAIiAGAI PVTlFSSNTvTvLTSGHLKCRVKMEKLQLKGT 


300 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 




Db 


527 


EFEEPHATKQSWALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 


586 


Qy 


301 


TYGVCSKAFKFARTPADTGHGTWLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 


360 






1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 I I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


587 


TYGVCSKAFKFARTPADTGHGTVVLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 


646 


Qy 


361 


FVSVATANSKVXIELEPPFSDSYIWGRGEQQINHHWH 


420 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


647 


FVSVATANSKVTiIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 


706 


Qy 


421 


LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 


480 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


707 


LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 


766 


Qy 


481 


IAMTFLAVGGVLLFLSVNVHA 501 




Db 


767 


1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 
IAMTFLAVGGVLLFLSVNVHA 787 





RESULT 6 
P0LG_WNV 

ID POLG_WNV STANDARD; PRT; 3430 AA. 

AC P06935; 



DT 01-JAN-1988, integrated into UniProtKB/Swiss-Prot . 

DT 24-OCT-2003, sequence version 2. 

DT 07-MAR-2006, entry version 64. 

DE Genome polyprotein [Contains: Capsid protein C (Core protein); 

DE Envelope protein M (Matrix protein) ; Major envelope protein E; 

DE Nonstructural protein 1 (NS1) ; Nonstructural protein 2A (NS2A) ; 

DE Flavivirin protease NS2B regulatory subunit; Flavivirin protease NS3 

DE catalytic subunit (EC 3.4.21.91); Nonstructural protein 4A (NS4A) ; 

DE Nonstructural protein 4B (NS4B) ; RNA-directed RNA polymerase 

DE (EC 2.7.7.48) (NS5) ] . 

OS West Nile virus (WN) . 

OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 

OC Flavi virus; Japanese encephalitis virus group. 

OX NCBI_TaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE [GENOMIC RNA] . 

RX MEDLINE=86124703; PubMed=3753811 ; 

RA Castle E. , Leidner U., Nowak T., Wengler G., Wengler G.; 

RT "Primary structure of the West Nile flavivirus genome region coding 

RT for all nonstructural proteins."; 

RL Virology 149: 10-26(1986) . 

RN [2] 

RP SEQUENCE REVISION TO 1908; 2018-2036; 2242 AND 2859-2860. 

RX MEDLINE=21176376; PubMed=11277701 ; DOI=10 . 1006/viro . 2000 . 0795; 

RA Yamshchikov V.F., Wengler G. , Perelygin A. A. , Brinton M.A. , 

RA Compans R.W. ; 

RT "An infectious clone of the West Nile flavivirus."; 

RL Virology 281:294-304(2001). 

RN [3] 

RP NUCLEOTIDE SEQUENCE [GENOMIC RNA] OF 1-291. 

RX MEDLINE=8 5274372; PubMed=2 992152; 

RA Castle E. f Nowak T., Leidner U., Wengler G., Wengler G. ; 

RT "Sequence analysis of the viral core protein and the membrane- 

RT associated proteins VI and NV2 of the flavivirus West Nile virus and 

RT of the genome sequence for these proteins."; 

RL Virology 145:227-236(1985). 

RN [4] 

RP NUCLEOTIDE SEQUENCE [GENOMIC RNA] OF 255-854. 

RX MEDLINE=86072082; PubMed=3855247 ; 

RA Wengler G., Castle E., Leidner U., Nowak T., Wengler G. ; 

RT "Sequence analysis of the membrane protein V3 of the flavivirus West 

RT Nile virus and of its gene."; 

RL Virology 147:264-274(1985). 

RN [5] 

RP DISULFIDE BONDS IN E PROTEIN. 

RX MEDLINE=87122143; PubMed=3811228 ; 

RA Nowak T., Wengler G. ; 

RT "Analysis of disulfides present in the membrane proteins of the West 

RT Nile flavivirus."; 

RL Virology 156:127-137(1987). 

CC -!- FUNCTION: The small proteins NS2A, NS4A and NS4B are hydrophobic, 
CC suggesting a possible membrane-related function. NS5 may play a 

CC role in the viral RNA replication. The NS2B/NS3 protease complex 

CC processes the viral polyprotein. 

CC -!- CATALYTIC ACTIVITY: Selective hydrolysis of -Xaa-Xaa- | -Yaa- bonds 
CC in which each of the Xaa can be either Arg or Lys and Yaa can be 

CC either Ser or Ala. 



CC CATALYTIC ACTIVITY: Nucleoside triphosphate + RNA(n) = diphosphate 

CC + RNA(n+l) . 

CC -!- SUBUNIT: NS3 and NS2B form a heterodimer . NS3 is the catalytic 
CC subunit, whereas NS2B strongly stimulates the latter (By 

CC similarity) . 

CC -!- PTM: Specific enzymatic cleavages in vivo yield mature proteins 
CC (By similarity) . 

CC -!- MISCELLANEOUS: The virion of this virus is a nucleocapsid covered 
CC by a lipoprotein envelope. The envelope contains two proteins: the 

CC protein M and glycoprotein E. The nucleocapsid is a complex of 

CC protein C and mRNA. In immature particles, there are 60 

CC icosaedrally organized trimeric spikes on the surface. Each spike 

CC consists of three heterodimers of envelope protein M precursor 

CC (prM) and envelope protein E (By similarity) . 

CC -!- SIMILARITY: Contains 1 peptidase S7 domain. 

CC -!- SIMILARITY: Contains 1 RdRp catalytic domain. 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; M12294; AAA48498.2; -; Genomic_RNA. 



DEAD. 

DEAD/ DEAH_N . 

DEAH_box. 

Flav_glyE_cen_dm. 

Flavi_capsidC . 

Flavi_DEAD. 

Flavi_M. 

Flavi_NSl. 

Flavi_NS2A. 

Flavi_NS2B. 

Flavi_NS4A. 

Flavi_NS4B. 

Flavi_NS5. 

Flavi_propep . 

Flv_glyE_Ig-like . 

Helicase_C. 

Peptidase_S7 . 

RNA_pol_DS_PS . 

RNA_pol_PSvir . 

RrmJFts J_mtf rase . 

Vrl_glyE_cen_dim. 



DR 


Pfam; 


PF01003, 


; Flavi_ 


capsid; 1 . 


DR 


Pf am; 


PF07652, 


; Flavi_ 


_DEAD; 1. 


DR 


Pfam; 


PF02832, 


r Flavi_ 


_glycop_C; 1. 


DR 


Pfam; 


PF00869, 


; Flavi_ 


_glycoprot; 1. 


DR 


Pfam; 


PF01004 


; Flavi 


M; 1. 


DR 


Pfam; 


PF00948 


; Flavi 


_NS1; 1. 


DR 


Pfam; 


PF01005 


; Flavi_ 


_NS2A; 1. 


DR 


Pfam; 


PF01002 


? Flavi^ 


_NS2B; 1. 


DR 


Pfam; 


PF01350 


? Flavi 


"NS4A; 1. 


DR 


Pfam; 


PF01349 


; Flavi_ 


_NS4B; 1. 


DR 


Pfam; 


PF00972 


? Flavi 


NS5; 1. 



DR 


PIR; A25256; GNWVWV. 


DR 


HSSP; Q88653; 1L9K. 


DR 


SMR; P06935; 25-97. 


DR 


MEROPS; S07.001; 


DR 


InterPro, 


• IPR001410; 


DR 


InterPro, 


• IPR011545; 


DR 


InterPro, 


• IPR002464; 


DR 


InterPro, 


! IPR011999; 


DR 


InterPro, 


• IPR001122; 


DR 


InterPro, 


; IPR011492; 


DR 


InterPro, 


? IPR000069; 


DR 


InterPro, 


r IPR001157; 


DR 


InterPro, 


? IPR000752; 


DR 


InterPro, 


f IPR000487; 


DR 


InterPro, 


? IPR000404; 


DR 


InterPro, 


? IPR001528; 


DR 


InterPro, 


? IPR000208; 


DR 


InterPro, 


? IPR002535; 


DR 


InterPro 


? IPR000336; 


DR 


InterPro 


; IPR001650; 


DR 


InterPro 


; IPR001850; 


DR 


InterPro 


; IPR007095; 


DR 


InterPro 


? IPR007094; 


DR 


InterPro 


? IPR002877; 


DR 


InterPro 


? IPR011998; 



DR 


Pfam; PF01570; Flavi_propep; 


1. 


DR 


Pfam; PF01728; FtsJ; 1. 






DR 


Pfam; PF00271; Helicase 


C; 1 




DR 


Pfam; PF00949; Peptidase S7; 


1. 


DR 


ProDom; PD001496; 


Flavi 


NS1; 


1. 


DR 


SMART; SM00487; DEXDc; 


1. 




DR 


SMART; SM00490; HELICc; 


1. 




DR 


PROSITE; 


PS00690; 


DEAH_ 


ATP HELICASE; FALSE_NEG. 


DR 


PROSITE; 


PS50507; 


RDRP 


SSRNA 


POS; 1. 


KW 


ATP-binding; Capsid protein; 


Core protein; Envelope protein; 


KW 


Glycoprotein; Helicase; 


Hydrolas e ; Membrane ; Nucleotide-binding ; 


KW 


Nucleotidyltransferase; 


Polyprotein; RNA-directed RNA polymerase; 


KW 


Structural protein; Transferase; Transmembrane. 


FT 




1 








FT 










/FTId=PRO 0000037743. 


FT 


TMTT MF.T 

JL IN -L 1. 11 ill J. 


1 


1 




Rpmovpd "from caDsid Drotein C bv the 


FT 










rpl ] ill ar aminoDGDtidase . 


FT 


PROPEP 


124 


215 






FT 










/FTTH=PRO 0000037744 


FT 


PHATM 


216 


290 






FT 










/FTId=PRO 0000037745. 


FT 


CHAIN 


291 


787 




Major envelope protein E. 


FT 










/FTId=PRO 0000037746. 


FT 


^ 1 i-TA-L IN 


788 


1139 




Non^t ri]rt"iiral Drotein 1 1. 


FT 










/FTId=PRO 0000037747. 


FT 


CHAIN 


1140 


1370 




Nonstructural Drotein 2A. 


FT 










/FTId=PRO 0000037748. 


FT 


CHAIN 


1371 


1501 




Flavivirin protease NS2B regulatory 


FT 










subuni t . 


FT 










/FTId=PRO 0000037749. 


FT 


CHAIN 


1502 


2120 




Flavivirin protease NS3 catalytic 


FT 










subunit . 


FT 










/FTId=PRO 0000037750. 


FT 


CHAIN 


2121 


2269 




Nonstructural protein 4A. 


FT 










/FTId=PRO 0000037751. 


FT 


CHAIN 


2270 


2525 




Nonstructural protein 4B. 


FT 










/FTId=PRO_0000037752 . 


FT 


CHAIN 


2526 


3430 




RNA-directed RNA polymerase. 


FT 










/FTId=PRO 0000037753.. 


FT 


DOMAIN 


1508 


1679 




Peptidase S7. 


FT 


DOMAIN 


3055 


3207 




RdRp catalytic. 


FT 


NP BIND 


1695 


1702 




ATP (Potential) . 


FT 


REGION 


388 


401 




Involved in fusion. 


FT 


MOTIF 


1786 


1789 




DEAH box. 


FT 


ACT_SITE 


1552 


1552 




Charge relay system (By similarity) . 


FT 


ACT_SITE 


1576 


1576 




Charge relay system (By similarity) . 


FT 


ACT SITE 


1636 


1636 




Charge relay system (By similarity) . 


FT 


CARBOHYD 


138 


138 




N-linked (GlcNAc. . .) (Potential). 


FT 


CARBOHYD 


917 


917 




N-linked (GlcNAc. . .) (Potential). 


FT 


CARBOHYD 


962 


962 




N-linked (GlcNAc. . .) (Potential). 


FT 


CARBOHYD 


994 


994 




N-linked (GlcNAc. . .) (Potential). 


FT 


CARBOHYD 


1289 


1289 




N-linked (GlcNAc. . .) (Potential). 


FT 


CARBOHYD 


2336 


2336 




N-linked (GlcNAc. . .) (Potential). 


FT 


CARBOHYD 


2489 


2489 




N-linked (GlcNAc. .) (Potential). 


FT 


DISULFID 


293 


320 






FT 


DISULFID 


350 


406 






FT 


DISULFID 


364 


395 







FT DISULFID 382 411 

FT DISULFID 476 574 

FT DISULFID 591 622 

SQ SEQUENCE 3430 AA; 380110 MW; 42D71B7CB12DC45B CRC64 ; 

Query Match 97.5%; Score 2572; DB 1; Length 3430; 

Best Local Similarity 98.2%; Pred. No. 2.9e-186; 

Matches 492; Conservative 2; Mismatches 3; Indels 4; Gaps 1; 

Qy 1 FNCLGMSNRDFLEGVSGATWDLVTjEGDSCWIMSKDK^ 60 

I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 291 FNCLGMSNRDFLEGVSGATWDLVXEGDSCWIMSKDKPTIDV1<MMNMEAANLADv^ 350 

Qy 61 YLASVSDLSTKAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 120 

I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 351 YLASVSDLSTRAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 410 

Qy 121 CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 180 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I 

Db 411 CTTKATGWIIQKENIKYEVAIFVHGPTTVESHG KIGATQAGRFSITPSAPSYTLKL 466 

Qy 181 GEYGEWVT)CEPRSGIDTSAYYVMSVGAKSFLV^REWFMDLNLPWSSAGSTTWRNRETLM 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 467 GEYGE VTVDCEPRSGI DTSAYYVMSVGEKS FLVHREWFMDLNLPWS SAGSTTWRNRETLM 526 

Qy 241 EFEEPHATKQSWALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 527 EFEEPHATKQSVVALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 586 

Qy 301 TYGVCSKAFKFARTPADTGHGTVVLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 360 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 587 TYGVCSKAFKFARTPADTGHGTVVLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 646 

Qy 361 FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 647 FVSVATANSKVLIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 706 

Qy 421 LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 480 

I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 707 LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 766 

Qy 481 IAMTFLAVGGVLLFLSVNVHA 501 

I I I I I I I I I I I I I I I I I I I I I 
Db 767 IAMTFLAVGGVLLFLSVNVHA 787 



RESULT 7 
Q5EVN3WNV 
ID 
AC 
DT 
DT 
DT 
DE 
OS 



OC 
OC 



PRELIMINARY; 



PRT; 3433 AA. 



Q5EVN3_WNV 
Q5EVN3; 
15-MAR-2005, 
15-MAR-2005, 
07-FEB-2006, 
Polyprotein. 
West Nile virus (WN) . 
Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 
Flavi virus; Japanese encephalitis virus group. 



integrated into UniProtKB/TrEMBL. 
sequence version 1. 
entry version 5. 



OX NCBI_TaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=96-111; 

RX PubMed=15752452; 

RA Schuf fenecker I. f Peyrefitte C.N., el Harrak M. , Murri S., Leblond A. , 

RA Zeller H.G. ; 

RT "West Nile Virus, in Morocco, 2003."; 

RL Emerg. Infect. Dis. 11:306-309(2005). 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; AY701412; AAT92098.1; -; Genomic_RNA. 

DR SMR; Q5EVN3 ; 25-97. 

DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO: 0019028; C: viral capsid; IEA. 

DR GO; GO: 0019031; C: viral envelope; IEA. 

DR. GO; GO:0005524; F: ATP binding; IEA. 

DR GO; GO: 0008026; F: ATP-dependent helicase activity; IEA. 

DR GO; GO: 0003725; F: double-stranded RNA binding; IEA. 

DR GO; GO: 0003724; F: RNA helicase activity; IEA. 

DR GO; GO:0003968; F: RNA-directed RNA polymerase activity; IEA. v 

DR GO; GO: 0004252; F: serine-type endopeptidase activity; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0019079; P: viral genome replication; IEA. 

DR InterPro; IPR001410; DEAD. 

DR InterPro; IPR011545; DEAD/ DEAH_N . 

DR InterPro; IPR011999; Flav_glyE_cen_dm. 

DR InterPro; IPR001122; Flavi_capsidC. 

DR InterPro; IPR011492; Flavi_DEAD. 

DR InterPro; IPR000069; FlaviJVI. 

DR InterPro; IPR001157; Flavi_NSl. 

DR InterPro; IPR000752; Flavi_NS2A. 

DR InterPro; IPR000487; Flavi_NS2B. 

DR InterPro; IPR000404; Flavi_NS4A. 

DR InterPro; IPR001528; Flavi_NS4B. 

DR InterPro; IPR000208; Flavi_NS5. 

DR InterPro; IPR002535; Flavi_propep . 

DR InterPro; IPR000336; Flv_glyE_Ig-like. 

DR InterPro; IPR001650; Helicase_C. 

DR InterPro; IPR001850; Peptidase_S7 . 

DR InterPro; IPR007095; RNA_pol_DS_PS . 

DR InterPro; IPR007094; RNA_pol_PSvir . 

DR InterPro; IPR002877; RrmJFts J_mtf rase . 

DR InterPro; IPR011998; Vrl_glyE_cen_dim. 

DR InterPro; IPR001680; WD40. 

DR Pfam; PF01003; Flavi_capsid; 1. 

DR Pfam; PF07652; Flavi_DEAD; 1. 

DR Pfam; PF02832; Flavi_glycop_C; 1. 

DR Pfam; PF00869; Flavi_glycoprot ; 1. 

DR Pfam; PF01004; Flavi_M; 1. 

DR Pfam; PF00948; Flavi_NSl; 1. 

DR Pfam; PF01005; Flavi_NS2A; 1. 

DR Pfam; PF01002; Flavi_NS2B; 1. 

DR Pfam; PF01350; Flavi_NS4A; 1. 

DR Pfam; PF01349; Flavi NS4B; 1. 



DR Pfam; PF00972; Flavi_NS5; 1. 

DR Pfam; PF01570; Flavi_propep; 1. 

DR Pfam; PF01728; FtsJ; 1. 

DR Pfam; PF00271; Helicase_C; 1. 

DR Pfam; PF00949; Peptidase_S7 ; 1. 

DR ProDom; PD001496; Flavi_NSl; 1. 

DR SMART; SM00487; DEXDc; 1. 

DR SMART; SM00490; HELICc; 1. 

DR PROSITE; PS00678; WD_REPEATS_1; UNKN0WN_1.- 

KW Polyprotein. 

SQ SEQUENCE 3433 AA; 381249 MW; 7ECC96DBFD9D53DA CRC64; 



Query Match 96.0%; Score 2532; DB 2; Length 3433; 

Best Local Similarity 95.6%; Pred. No. 3.2e-183; 

Matches 479; Conservative 13; Mismatches 9; Indels 0; Gaps 0; 



Qy 




C IN ^ Xj oi l O IN r\LT xji-i\j v O \jr\L v» v L/Jj v JjCjuU jv^ v x jl i jo ivJJ at x x. u v rvi ii un i iiiLr\r\i* jjtu^ v x \^ 


60 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 




Db 


291 


FNCLGMSNRDFLEGVSGATWVDLVXEGDSCvTIMSKDKPTIDVT 


350 


Qy 


Dl 


VT AW^nT <5T^A ArPTMnFAHMFKRAnPAFVrKOf^VVDRfiWKNnrGT.FGKGSTDTCAKFA 


120 






1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 




Db 


351 


YIATVSDLSTKAACPTMGEAHNDKRADPAFVCRQGVVDRGWGNGCGLFGKGSIDTCAKFA 


410 


Qy 


191 


PTTTCATnWTTOK'FMTKYFVATFVTTnPTTVFSHGMYSTOTPiATOAGRFSTTPSAPSYTLKT, 

^11 lYr\X VjW X. -L \£ A. Hi IN J. x\ X Ci I. V il VJ t XXV DOIIVJIN X *J X ^ J. O^VX \^r\\J »J X J. XT kJ^VLT O X X XJIvXJ 


180 






1:11111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II : 1 1 1 1 1 1 1 1 




Db 


411 


CSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPAAPSYTLKL 


470 


Qy 


181 


GEYGEWVTiCEPRSGIDTSAYYVMSVGAKSFLvTiREWFMDLNLPWSSAGSTTWRNRETm 


240 




I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 • 1 II 1 1 • 1 1 1 * 1 II 1 1 1 1 1 1 II 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 II 1 1 1 ! 1 1 1 1 1 1 • II 1 1 1 • 1 1 1 • 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


471 


GEYGEVTVT>CEPRSGIDTNAYYViV[TVGTKTF 


530 


Qy 


241 


EFEEPHATKQSWALGSQEGALHQAlxAGAIPv^FSSNTW 


300 






1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


531 


EFEEPHATKQS VI ALGSQEGALHQALAGAI PVEFS SNTVKLT S GHLKCRVKMEKLQLKGT 


590 


Qy 


301 


TYGVCSKAFKFARTPADTGHGTWLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 


360 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


591 


TYGVCSKAFKFLGTPADTGHGTWLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 


650 


Qy 


361 


L^SVATANSKVXIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 


420 






1 1 1 1 II 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 




Db 


651 


FVSVATANAKVXIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLKGAQRlxAA 


710 


Qy 


421 


LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 


480 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


711 


LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 


770 


Qy 


481 


IAMTFLAVGGVLLFLSVNVHA 501 




Db 


771 


1 1 : 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 

I ALT FlxAVGG VXLFLS VNVHA 791 





RESULT 8 
Q9WI84_WNV 

ID Q9WI84_WNV PRELIMINARY; PRT; 501 AA. 

AC Q9WI84; 



DT 01-NOV-1999, integrated into UniProtKB/TrEMBL . 

DT 01-NOV-1999, sequence version 1. 

DT 07-FEB-2006, entry version 20. 

DE Envelope glycoprotein (Fragment) . 

OS West Nile virus (WN) . 

OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 

OC Flavi virus; Japanese encephalitis virus group. 

OX NCBI_TaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=KN3829; 

RX MEDLINE=20271587; PubMed=10813479; 

RA Miller B.R., Nasci R.S., Godsey M.S., Savage H.M., Lutwama J. J., 

RA Lanciotti R.S., Peters C.J.; 

RT "First field evidence for natural vertical transmission of West Nile 

RT virus in Culex univittatus complex mosquitoes from Rift Valley 

RT province , Kenya . " ; 

RL Am. J. Trop. Med. Hyg. 62:240-246(2000). 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; AF146082; AAD31720.1; -; Genomic_RNA. 

DR HSSP; Q88653; 10KE. 

DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO: 0019031; C: viral envelope; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR InterPro; IPR011999; Flav_glyE_cen_dm. 

DR InterPro; IPR000336; FlvjglyE_Ig-like . 

DR InterPro; IPR011998; Vrl_glyE_cen_dim. 

DR Pfam; PF02832; Flavi_glycop_C; 1. 

DR Pfam; PF00869; Flavi_glycoprot ; 1. 

KW Envelope protein. 

FT NONJTER 1 1 

FT NON_TER 501 501 

SQ SEQUENCE 501 AA; 53622 MW; D2A9C827F71C00D5 CRC64; 

Query Match 95.9%; Score 2531; DB 2; Length 501; 

Best Local Similarity 95.4%; Pred. No. 2.8e-184; 

Matches 478; Conservative 14; Mismatches 9; Indels 0; Gaps 0; 

Qy 1 FNCLGMSNRDFLEGVS GATWVDLVLEGDSCVT IMS KDKPT I DVKMMNMEAANLADVRS YC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I : I I I I I 
Db 1 FNCLGMSNRDFLEGVSGATWVX)LVLEGDSCWIMSKDKPTIDVT<MMNMEAANLA^ 60 

Qy 61 YLASVSDLSTKAACPTMGEAHNEKRADPAFVCKQGVVDRGWGNGCGLFGKGSIDTCAKFA 120 

I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 YLAT VS DL S T KAAC PTMGEAHN DKRAD P AFVC RQGWDRGWGNGCGL FGKGS I DT CAK FA 120 

Qy 121 CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 180 

1:11111 I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I 
Db 121 CSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPAAPSYTLKL 180 



Qy 

Db 



181 
181 



GEYGEWVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLNLPWSSAGSTTWRNRETLM 

I I I I I I I I I I I I I I I I I i : I I I I I : I I 1:11111111111 Mill II hill 

GEYGEWVT)CEPRSGIDTNAYYVMTVGTKTFLVHREWFMDLNLPWSSAGSTVWRNRETLM 



240 
240 



Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


361 


Qy 


421 


Db 


421 


Qy 


481 


Db 


481 



E FE E P HAT KQ SWAL G S Q E GALHQ ALAGAI P VE F S S NT VKLT S GH L KC RVKME K LQL KGT 300 

I I I I I I I I I I I I : I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
EFEEPHATKQSVI ALGSQEGALHQALAGAI PVEFS SNTVKLTSGHLKCRVKMEKLQLKGT 300 

TYGVCSKAFKFARTPADTGHGTVVLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 360 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
TYGVCSKAFKFLGTPADTGHGTWLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 360 

FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 420 

I I I I II II : I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I : I I I I I I I 
FVSVATANAKVLIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLKGAQRLAA 420 

LGDTAWDFGS VGGVFT S VGKAI HQVFGGAFRS LFGGMSWI TQGLLGALLLWMGINARDRS 480 

I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRS LFGGMSWI TQGLLGALLLWMGINARDRS 480 

IAMTFLAVGGVLLFLSVNVHA 501 

I I : I I I I I I I I I I I II I I I I I 
IALTFLAVGGVLLFLSVNVHA 501 



RESULT 9 
Q9WHD1_WNV 

ID Q9WHD1_WNV PRELIMINARY; PRT; 773 AA. 

AC Q9WHD1; 

DT 01-NOV-1999, integrated into UniProtKB/TrEMBL . 

DT 01-NOV-1999, sequence version 1. 

DT 07-FEB-2006, entry version 24. 

DE Polyprotein (Fragment) . 

OS West Nile virus (WN) . 

OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 

OC Flavi virus; Japanese encephalitis virus group. 

OX NCBI_TaxID=11082 ; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=RO97-50; 

RX MEDLINE=20014331; PubMed=10548295; 

RA Savage H.M., Ceianu C, Nicolescu G., Karabatsos N . , Lanciotti R. , 

RA Vladimirescu A., Laiv L., Ungureanu A. , Romanca C. , Tsai T.F.; 

RT "Entomologic and avian investigations of an epidemic of West Nile 

RT fever in Romania in 1996, with serologic and molecular 

RT characterization of a virus isolate from mosquitoes ; 

RL Am. J. Trop. Med. Hyg. 61:600-611(1999). 

RN [2] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=RO97-50; 

RA Lanciotti R.L., Ludwig M.L., Savage H.M. ; 

RL Submitted (FEB-1999) to the EMBL/ GenBank/DDB J databases. 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; AF130362; AAD28623.1; -; Genomic_RNA. 

DR HSSP; Q88653; 10KE. 

DR SMR; Q9WHD1; 1-72. 

DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO: 0019028; C: viral capsid; IEA. 



DR GO; GO: 0019031; C: viral envelope; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0019058; P: viral infectious cycle; IEA. 

DR InterPro; IPR011999; Flav_glyE__cen_dm. 

DR InterPro; IPR001122; Flavi_capsidC. 

DR InterPro; IPR000069; Flavi_M. 

DR InterPro; IPR002535; Flavi_propep . 

DR InterPro; IPR000336; Flv_glyE_Ig-like . 

DR InterPro; IPR011998; Vrl_glyE_cen_dim. 

DR Pfam; PF01003; Flavi_capsid; 1. 

DR Pfam; PF02832; Flavi_glycop_C; 1. 

DR Pfam; PF00869; Flavi_glycoprot ; 1. 

DR Pfam; PF01004; FlaviJVI; 1. 

DR Pfam; PF01570; Flavi_propep; 1. 



KW 


Polyprotein. 






FT 


CHAIN 


<1 


88 


capsid protein. 


FT 


CHAIN 


89 


265 


pre-membrane/membrane protein. 


FT 


CHAIN 


266 


766 


envelope glycoprotein. 


FT 


NON_TER 


1 


1 




FT 


NONJTER 


773 


773 




SQ 


SEQUENCE 


773 AA; 


83364 


MW; 2C33EA27EC676EE7 CRC64 ; 



Query Match 95.9%; Score 2531; DB 2; Length 773; 

Best Local Similarity 95.4%; Pred. No. 5.1e-184; 

Matches 478; Conservative 14; Mismatches 9; Indels 0; 



Gaps 



0; 



Qy 



Db 



1 FNCLGMSNRDFLEGVSGATWVT)LVXEGDSCWIMSKDKPTIDv^ 60 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I : I I I I I 
266 FNCLGMSNRDFLEGVSGATWVTDLVljEGDSCWIMSKDKPTIDvT^MMNMElAANL^ 325 



Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



61 YLASVSDLSTKAACPTMGEAHNEKRADPAFVCKQGWDRGWGNGCGLFGKGSIDTCAKFA 120 
I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
326 YLATVSDLSTKAACPTMGEAHNDKRADPAFVCRQGWDRGWGNGCGLFGKGSIDTCAKFA 385 



121 



180 



CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 
1:11111 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I 
386 CSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPAAPSYTLKL 445 

181 GEYfeEVTVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLNLPWSSAGSTTWRNRETLM 240 

I I I I I I I I I I I I I I I I I I : I I I I I : I I I : I I I I I I II I I I I I I I I I I I I I II I I I I I I 
446 GEYGEWVDCEPRSGIDTNAYYVT4TVGTKTFLVHREWFMDLNLPWSSAGSTVWRNRETLM 505 



241 



300 



EFEE PHAT KQ S WALGS QEGALHQALAGAI PVE FS SNT VKLT S GH LKCRVKMEKLQLKGT 
I I I I I I I I I I I I : I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
506 EFEEPHATKQSVIALGSQEGALHQALAGAIPVTiFSSNTVKLTSGHLKCRVKMEKLQLKGT 565 

301 TYGVCSKAFKFARTPADTGHGTWLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 360 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
566 T YGVCS KAFKFLGT PADTGHGT WLELQ YTGTDGPCKVP I S S VAS LNDLT PVGRLVTVN P 625 



361 



420 



FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 
I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I 
626 FVSVATANAKVXIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLKGAQRLAA 685 



Qy 

Db 



421 
686 



LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 4 80 
I I I I I I I I I II I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
LGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 745 



Qy 

Db 



481 IAMTFLAVGGVLLFLSVNVHA 501 

II : I I I I I I I I I I I I I I I I I I 
74 6 IALTFLAVGGVLLFLSVNVHA 766 



RESULT 10 
Q5EVN2_WNV 

ID Q5EVN2_WNV PRELIMINARY; PRT; 3433 AA. 

AC Q5EVN2; 

DT 15-MAR-2005, integrated into UniProtKB/TrEMBL . 

DT 15-MAR-2005, sequence version 1. 

DT 07-FEB-2006, entry version 5. 

DE Polyprotein. 

OS West Nile virus (WN) . 

OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 

OC Flavi virus; Japanese encephalitis virus group. 

OX NCBIJTaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=04.05; 

RX PubMed= 15752452; 

RA Schuf f enecker I., Peyrefitte C.N., el Harrak M. , Murri S., Leblond A., 

RA Zeller H.G. ; 

RT "West Nile Virus in Morocco, 2003."; 

RL Emerg. Infect. Dis . 11:306-309(2005). 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

CC 

DR EMBL; AY701413; AAT92099.1; -; Genomic_RNA. 

DR SMR; Q5EVN2 ; 25-97. 

DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO: 0019028; C: viral capsid; IEA. 

DR GO; GO: 0019031; C: viral envelope; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0008026; F: ATP-dependent helicase activity; IEA. 

DR GO; GO: 0003725; F: double-stranded RNA binding; IEA. 

DR GO; GO: 0003724; F: RNA helicase activity; IEA. 

DR GO; GO: 0003968; F: RNA-directed RNA polymerase activity; IEA. 

DR GO; GO: 0004252; F: serine-type endopeptidase activity; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0019079; P: viral genome replication; IEA. 

DR InterPro; IPR001410; DEAD. 

DR InterPro; IPR011545; DEAD/ DEAH_N . 

DR InterPro; IPR011999; Flav_glyE_cen_dm. 

DR InterPro; IPR001122; Flavi_capsidC. 

DR InterPro; IPR011492; Flavi_DEAD. 

DR InterPro; IPR000069; Flavi_M. 

DR InterPro; IPR001157; Flavi_NSl. 

DR InterPro; IPR000752; Flavi_NS2A. 

DR InterPro; IPR000487; Flavi_NS2B. 

DR InterPro; IPR000404; Flavi_NS4A. 

DR InterPro; IPR001528; Flavi_NS4B. 

DR InterPro; IPR000208; Flavi_NS5. 

DR InterPro; IPR002535; Flavi_propep . 

DR InterPro; IPR000336; Flv_glyE_Ig-like . 



DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 
SQ 



IPR001650; Helicase_C. 

IPR001850; Peptidase_S7 . 

IPR007095; RNA_pol_DS_PS . 

IPR007094; RNA_pol_PSvir . 

IPR002877; RrmJFts Jjtitf rase . 

IPR011998; Vrl_glyE_cen_dim. 

IPR001680; WD40. 



InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
InterPro 
Pfam; PF01003; Flavi_capsid; 1. 
Pfam; PF07652; Flavi__DEAD; 1. 
Pfam; PF02832; Flavi_glycop_C; 1. 
Pfam; PF00869; Flavi_glycoprot ; 1. 
Pfam; PF01004; Flavi_M; 1. 
Pfam; PF00948; Flavi_NSl; 1. 
Pfam; PF01005; Flavi_NS2A; 1. 
Pfam; PF01002; Flavi_NS2B; 1. 
Pfam; PF01350; Flavi_NS4A; 1. 
Pfam; PF01349; Flavi_NS4B; 1. 
Pfam; PF00972; Flavi_NS5; 1. 
Pfam; PF01570; Flavi_propep; 1. 
Pfam; PF01728; FtsJ; 1. 
Pfam; PF00271; Helicase_C; 1. 
Pfam; PF00949; Peptidase_S7 ; 1. 
ProDom; PD001496; Flavi_NSl; 1. 
SMART; SM00487; DEXDc; 1. 
SMART; SM00490; HELICc; 1. 

PROSITE; PS00678; WD_REPEATS_1; UNKNOWN_l . 
Polyprotein. 

SEQUENCE 3433 AA; 381202 MW; A98222C50069232A CRC64; 



Query Match 95.9%; Score 2531; DB 2; Length 3433; 

Best Local Similarity 95.4%; Pred. No. 3.8e-183; 

Matches 478; Conservative 14; Mismatches 9; Indels 0; 



Gaps 



0; 



Qy 



Db 



1 FNCLGMSNRDFLEGVSGATWVDLVLEGDS CVT IMS KDKPT I DVKMMNMEAANLADVRS YC 60 
I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
291 FNCLGMSNRDFLEGVSGATWVDLVXEGDSCWIMSKDKPTIDVT<MMNMEAANLAEVRSY 350 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



61 YLASVSDLSTKAACPTMGELAHNEKRADPAFVCKQGVVDRGWGNGCGLFGKGSIDTCAKFA 120 

I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I.I I I I I I I I I I I I I I I I I I I II I I I 
351 YLATVSDLSTKAACPTMGEAHNDKRADPAFVCRQGWDRGWGNGCGLFGKGSIDTCAKFA 410 



121 



180 



CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 
I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I II 
411 CSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPAAPSYTLKL 470 



181 



471 



241 



GEYGEVTVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLNLPWSSAGSTTWRNRETLM 240 
I I I I I I I I I I I I I I I I I I : I I I I I : I I I : I I I I I I I I I I I I I I I I II I I I I I I I II I I 
GEYGEWVDCEPRSGIDTNAYYVWVGTKTFLvlIREWFMDLNLPWSSAGSTVWRNRETLM 530 



EFEEPHATKQSWALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 300 
I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
531 EFEEPHATKQSVTALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 590 



Qy 301 
Db 591 



TYGVCSKAFKFARTPADTGHGTWLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 360 
I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TYGVCSKAFKFLGTPADTGHGTVVLELQYTGTDGPCKVPISSVASLNDLTPVGRLVTVNP 650 



Qy 361 FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 420 

I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I 

Db 651 EVSVATANAKVLIELEPPFGDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLKGAQRLAA 710 

Qy 421 LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 480 

I I I I I I I I I I I I II I I I I I I I : I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 711 LGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 770 

Qy 481 IAMTFLAVGGVLLFLSVNVHA 501 

I I : I I I I I I I II I I I I I I I I I 

Db 771 IALTFLAVGGVLLFLSVNVHA 791 

RESULT 11 
Q6WV07_WNV 

ID Q6WV07_WNV PRELIMINARY; PRT; 3433 AA. 

AC Q6WV07; 

DT 05-JUL-2004, integrated into UniProtKB/TrEMBL . 

DT 05-JUL-2004, sequence version 1. 

DT 07-FEB-2006, entry version 8. 

DE Polyprotein. 

GN Name=pol ; 

OS West Nile virus (WN) . 

OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 

OC Flavi virus; Japanese encephalitis virus group. 

OX NCBI_TaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=PaAn001; 

RX MEDLINE=22949215; PubMed=14585341 ; DOI=10 . 1016/S0042-6822 ( 03 ) 00536-1 ; 

RA Charrel R.N., Brault A.C., Gallian P., Lemasson J. -J., Murgue B., 

RA Murri S., Pastorino B., Zeller H., de chesse R. , de Micco P., 

RA de Lamballerie X.; 

RT "Evolutionary relationship between Old World West Nile virus strains. 

RT Evidence for viral gene flow between Africa, the Middle East, and 

RT Europe."; 

RL Virology 315:381-388(2003). 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 

cc 

DR EMBL; AY268132; AAQ00998.1; -; Genomic_RNA. 

DR HSSP; Q9Q4T1; 1BEF. 

DR SMR; Q6WV07; 25-97. 

DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO:0019028; C:viral capsid; IEA. 

DR GO; GO: 0019031; C: viral envelope; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0008026; F: ATP-dependent helicase activity; IEA. 

DR GO; GO: 0003725; F: double-stranded RNA binding; IEA. 

DR GO; GO: 0003724; F: RNA helicase activity; IEA. 

DR GO; GO: 0003968; F: RNA-directed RNA polymerase activity; IEA. 

DR GO; GO: 0004252; F: serine-type endopeptidase activity; IEA. 

DR GO; GO: 0005198; F: structural molecule activity; IEA. 

DR GO; GO: 0019079; P: viral genome replication; IEA. 

DR InterPro; IPR001410; DEAD. 

DR InterPro; IPR011545; DEAD/DEAH_N . 



DR InterPro; IPR011999; Flav_glyE_cen_dm. 

DR InterPro; IPR001122; Flavi_capsidC. 

DR InterPro; IPR011492; Flavi_DEAD . 

DR InterPro; IPR000069; Flavi_M. 

DR InterPro; IPR001157; Flavi_NSl. 

DR InterPro; IPR000752; Flavi_NS2A. 

DR InterPro; IPR000487; Flavi_NS2B. 

DR InterPro; IPR000404; Flavi_NS4A. 

DR InterPro; IPR001528; Flavi_NS4B. 

DR InterPro; IPR000208; Flavi_NS5. 

DR InterPro; IPR002535; Flavi_propep . 

DR InterPro; IPR000336; Flv_glyE_Ig-like . 

DR InterPro; IPR001650; Helicase_C. 

DR InterPro; IPR001850; Peptidase_S7 . 

DR InterPro; IPR007095; RNA_pol_DS_PS . 

DR InterPro; IPR007094; RNA_pol_PSvir . 

DR InterPro; IPR002877; RrmJFts J_mtf rase . 

DR InterPro; IPR011998; Vrl_glyE_cen_dim. 

DR InterPro; IPR001680; WD40. 

DR Pfam; PF01003; Flavi_capsid; 1. 

DR Pfam; PF07652; FlaviJDEAD; 1. 

DR Pfam; PF02832; Flavi_glycop_C; 1. 

DR Pfam; PF00869; Flavi_glycoprot ; 1. 

DR ■ Pfam; PF01004; Flavi_M; 1. 

DR Pfam; PF00948; Flavi_NSl; 1. 

DR Pfam; PF01005; Flavi_NS2A; 1. 

DR Pfam; PF01002; Flavi_NS2B; 1. 

DR Pfam; PF01350; Flavi_NS4A; 1. 

DR Pfam; PF01349; Flavi_NS4B; 1. 

DR Pfam; PF00972; Flavi_NS5; 1. 

DR Pfam; PF01570; Flavi_propep; 1. 

DR Pfam; PF01728; FtsJ; 1. 

DR Pfam; PF00271; Helicase C; 1. 



DR Pfam; PF00949; Peptidase_S7 ; 1. 

DR ProDom; PD001496; Flavi_NSl; 1. 

DR SMART; SM00487; DEXDc; 1. 

DR SMART; SM00490; HELICc; 1. 

DR PROSITE; PS00678; WD_REPEATS_1; UNKN0WN_1 . 

KW Polyprotein. 

SQ SEQUENCE 3433 AA; 381104 MW; 2F25A8012B297680 CRC64; 

Query Match 95.9%; Score 2531; DB 2; Length 3433; 

Best Local Similarity 95.4%; Pred. No. 3.8e-183; 

Matches 478; Conservative 14; Mismatches 9; Indels 0; Gaps 0; 

Qy 1 FNCLGMSNRDFLEGVSGATWvT)LVXEGDSCWIMSKDKPTIDV 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I : I I I I I 
Db 291 FNCLGMSNRDFLEGVSGATWVT)LVXEGDSCvTIMSKDK^ 350 

Qy 61 YLASVSDLSTKAACPTMGEAHNEKRADPAFVCKQGVVDRGWGNGCGLFGKGSIDTCAKFA 120 

I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 351 YLATVSDLSTKAACPTMGEAHNDKRADPAFVCRQGVVDRGWGNGCGLFGKGSIDTCAKFA 410 

Qy 121 CTTKATGWIIQKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPSAPSYTLKL 180 

I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I : I I I I I I I I 
Db 411 CSTKATGRTILKENIKYEVAIFVHGPTTVESHGNYSTQIGATQAGRFSITPAAPSYTLKL 470 



Qy 


181 


GEYGEWVDCEPRSGIDTSAYYVMSVGAKSFLVHREWFMDLNLPWSSAGSTTWRNRETLM 


240 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 : 1 1 1 1 1 : 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


471 


GEYGEWVDCEPRSGIDTNAYYVMTVGTKTFLVHREWFMDLNLPWSSAGSTVWRNRETLM 


530 


Qy 


241 


EFEEPHATKQSWALGSQEGALHQALAGAI PVEFSSNTVKLTSGHLKCRVKMEKLQLKGT 


300 






1 1 1 1 1 II 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


531 


E FE E P HAT KO S VI AL G S 0 EGALHOALAGAI P VE F S S NT VKLT S GH LKC RVKME KLOL KGT 


590 


Qy 


301 


TYGVCSKAFKFARTPADTGHGTWLELQYTGKDGPCKVPISSVASLNDLTPVGRLVTVNP 


360 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


591 


T YGVCS KAFKFLGT PADTGHGTWLELQ YTGTDGPCKVPI S S VAS LNDLT PVGRLVTVNP 


650 


Qy 


361 


FVSVATANSKVLIELEPPFSDSYIWGRGEQQINHHWHKSGSSIGKAFTTTLRGAQRLAA 


420 






1 1 1 1 II II : 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 II 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 : 1 1 1 1 1 1 1 




Db 


651 


FVS VATANAKVLI ELEP P FGDS YI WGRGEQQINHHWHKS GS S I GKAFTTTLKGAQRLAA 710 


Qy 


421 


LGDTAWDFGSVGGVFTSVGKAIHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 


480 






1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


711 


LGDTAWDFGSVGGVFTSVGKAVHQVFGGAFRSLFGGMSWITQGLLGALLLWMGINARDRS 


770 


Qy 


481 


IAMTFLAVGGVLLFLSVNVHA 501 








1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


771 


IALTFLAVGGVLLFLSVNVHA 791 





RESULT 12 
Q80B10_WNV 



ID Q80B10_WNV PRELIMINARY; PRT; 3433 AA. 

AC Q80B10; 

DT 01-JUN-2003, integrated into UniProtKB/TrEMBL . 

DT 01-JUN-2003, sequence version 1. 

DT 07-FEB-2006, entry version 13. 

DE Polyprotein. 

OS West Nile virus (WN) . 

OC Viruses; ssRNA positive-strand viruses, no DNA stage; Flaviviridae; 

OC Flavi virus; Japanese encephalitis virus group. 

OX NCBI_TaxID=11082; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RC STRAIN=KN3829; 

RX MEDLINE=22949215; PubMed=1458534 1 ; DOI=10 . 1016/S0042-6822 (03) 00536-1; 

RA Charrel R.N., Brault A.C., Gallian P., Lemasson J, -J,, Murgue B., 

RA Murri S., Pastorino B., Zeller H., de chesse R. , de Micco P., 

RA de Lamballerie X.; 

RT "Evolutionary relationship between Old World West Nile virus strains. 

RT Evidence for viral gene flow between Africa, the Middle East, and 

RT Europe . " ; 

RL Virology 315:381-388(2003). 

CC 

CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms 

CC Distributed under the Creative Commons Attribution-NoDerivs License 



CC 

DR EMBL; AY262283; AAP20887.1; -; Genomic_RNA. 

DR HSSP; Q88653; 1L9K. 

DR SMR; Q80B10; 25-97. 

DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO: 0019028; C: viral capsid; IEA. 



